🏅 Starting Info
¶Symmetric Mean Absolute Percentage Error (SMAPE): SMAPE is a metric used to measure the accuracy of a model's forecasts. It calculates the percentage difference between the predicted and actual values while considering their average. SMAPE provides a symmetric measurement that avoids the issue of asymmetry present in other metrics like MAPE.
Mean Absolute Error (MAE): MAE is the average of the absolute differences between the predicted and actual values. It gives an idea of how wrong the predictions were, with equal weight given to all errors.
Mean Squared Error (MSE): MSE is similar to MAE but squares the differences before averaging them. It assigns higher weight to large errors, making it more sensitive to outliers.
Root Mean Squared Error (RMSE): RMSE is the square root of MSE and has the same units as the original values. It is an interpretable measure that represents the standard deviation of the unexplained variance.
R-squared (R2): R2 provides an indication of the goodness of fit of a model's predictions to the actual values. It measures the proportion of variance in the data explained by the model, with a value of 1 indicating a perfect fit.
Mean Absolute Percentage Error (MAPE): MAPE is similar to SMAPE, but it can be asymmetrical, penalizing under- or over-forecasts differently. It depends on the distribution of the actual values and can be sensitive to extreme values.
Linear Regression: Linear regression models the relationship between the dependent variable and one or more independent variables by fitting a linear equation. It assumes a linear relationship between the predictors and the target variable.
from sklearn.linear_model import LinearRegression
Polynomial Regression: Polynomial regression extends linear regression by introducing polynomial terms to capture nonlinear relationships between the predictors and the target variable.
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
Ridge Regression: Ridge regression is a regularized version of linear regression that adds a penalty term to the loss function. It helps to reduce overfitting and improve generalization by shrinking the coefficients towards zero.
from sklearn.linear_model import Ridge
Lasso Regression: Lasso regression, similar to ridge regression, adds a penalty term to the loss function. However, it uses L1 regularization, which can perform variable selection by setting some coefficients to exactly zero.
from sklearn.linear_model import Lasso
Elastic Net Regression: Elastic Net regression combines L1 and L2 regularization to balance the strengths of ridge and lasso regression. It can handle correlated predictors better than lasso regression alone.
from sklearn.linear_model import ElasticNet
Decision Tree Regression: Decision tree regression models the target variable by recursively partitioning the feature space into regions based on feature values. Each partition represents a leaf node with a predicted value.
from sklearn.tree import DecisionTreeRegressor
Random Forest Regression: Random forest regression is an ensemble technique that combines multiple decision trees. It improves prediction accuracy by averaging the predictions of individual trees.
from sklearn.ensemble import RandomForestRegressor
Gradient Boosting Regression: Gradient boosting regression builds an ensemble of weak prediction models, such as decision trees, in a sequential manner. Each subsequent model corrects the errors made by the previous models, leading to improved predictions.
from sklearn.ensemble import GradientBoostingRegressor
Support Vector Regression (SVR): SVR is an extension of support vector machines for regression tasks. It uses a kernel function to map the input space into a higher-dimensional feature space, allowing for nonlinear regression.
from sklearn.svm import SVR
Neural Network Regression: Neural network regression utilizes deep learning architectures to model complex relationships between predictors and the target variable. It consists of multiple interconnected layers of artificial neurons that learn to approximate the target function.
from sklearn.neural_network import MLPRegressor
K-Nearest Neighbors Regression (KNN): KNN regression predicts the target value based on the average of the target values of its k nearest neighbors in the feature space.
from sklearn.neighbors import KNeighborsRegressor
Bayesian Regression: Bayesian regression applies Bayesian inference to estimate the parameters of a regression model. It provides a probabilistic framework for incorporating prior knowledge and uncertainty in the model.
from sklearn.linear_model import BayesianRidge
Gaussian Process Regression: Gaussian process regression models the target variable as a distribution over functions. It provides a nonparametric approach to regression that captures uncertainty in predictions.
from sklearn.gaussian_process import GaussianProcessRegressor
Generalized Linear Models (GLMs): GLMs are a broad class of regression models that unify various regression techniques, including linear regression, logistic regression, and Poisson regression. They provide a flexible framework for modeling different types of data and response variables.
from sklearn.linear_model import TweedieRegressor # Example of GLM
Huber Regression: Huber regression is a robust regression model that is less sensitive to outliers compared to ordinary least squares regression.
from sklearn.linear_model import HuberRegressor
Passive Aggressive Regression: Passive Aggressive regression is an online learning algorithm that updates its model incrementally, suitable for scenarios where data arrives in streams.
from sklearn.linear_model import PassiveAggressiveRegressor
Isotonic Regression: Isotonic regression is a nonparametric regression model that fits a monotonic function to the data.
from sklearn.isotonic import IsotonicRegression
Orthogonal Matching Pursuit (OMP): OMP is a sparse regression model that iteratively selects relevant features and fits the model using least squares.
from sklearn.linear_model import OrthogonalMatchingPursuit
XGBoost Regression: XGBoost is an optimized gradient boosting regression model known for its high performance and scalability.
import xgboost as xgb
LightGBM Regression: LightGBM is another gradient boosting regression model that offers high efficiency and handles large-scale data.
import lightgbm as lgb
CatBoost Regression: CatBoost is a gradient boosting regression model that supports categorical features and incorporates innovative techniques.
from catboost import CatBoostRegressor
HistGradientBoosting Regression: HistGradientBoosting is a histogram-based gradient boosting regression model that provides fast and accurate predictions.
from sklearn.experimental import enable_hist_gradient_boosting
from sklearn.ensemble import HistGradientBoostingRegressor
ARIMA (Autoregressive Integrated Moving Average): ARIMA is a time series forecasting model that combines autoregressive and moving average components.
from statsmodels.tsa.arima.model import ARIMA
Prophet: Prophet is a time series forecasting model developed by Facebook that captures seasonal and trend patterns.
from prophet import Prophet
🛫 Imports
¶'''
--------------------------------------------------------
REGRESSION MODELS
--------------------------------------------------------
'''
from sklearn.ensemble import BaggingRegressor
from sklearn.ensemble import HistGradientBoostingRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import HuberRegressor
from sklearn.linear_model import PassiveAggressiveRegressor
from sklearn.linear_model import Ridge
from sklearn.linear_model import Lasso
from sklearn.linear_model import ElasticNet
from sklearn.linear_model import BayesianRidge
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.neural_network import MLPRegressor
from sklearn.neighbors import KNeighborsRegressor
from xgboost import XGBRegressor
from lightgbm import LGBMRegressor
from catboost import CatBoostRegressor
from prophet import Prophet
'''
--------------------------------------------------------
CLASSIFICATION MODELS
--------------------------------------------------------
'''
from sklearn.ensemble import RandomForestClassifier
from sklearn.naive_bayes import MultinomialNB
from sklearn.linear_model import LogisticRegression
from sklearn import svm
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.ensemble import AdaBoostClassifier
from xgboost import XGBClassifier
from lightgbm import LGBMClassifier
from catboost import CatBoostClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.neural_network import MLPClassifier
from sklearn.ensemble import VotingClassifier
from sklearn.gaussian_process import GaussianProcessClassifier
'''
--------------------------------------------------------
FEAUTURE ENGINEERING
--------------------------------------------------------
'''
from nltk.stem import WordNetLemmatizer
from nltk import pos_tag
from nltk.corpus import stopwords
from nltk.corpus import wordnet
from sklearn.feature_extraction.text import CountVectorizer
from nltk.stem import PorterStemmer
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.preprocessing import PowerTransformer
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.metrics import plot_confusion_matrix, classification_report
from sklearn.pipeline import make_pipeline
from imblearn.over_sampling import SMOTE
from sklearn.decomposition import LatentDirichletAllocation
from sklearn.preprocessing import MinMaxScaler
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import LabelEncoder, OrdinalEncoder, OneHotEncoder, StandardScaler, RobustScaler, QuantileTransformer, KBinsDiscretizer, PolynomialFeatures
from sklearn.feature_selection import SelectKBest, SelectPercentile, SelectFromModel, RFE, RFECV, VarianceThreshold
from sklearn.compose import ColumnTransformer
from sklearn.model_selection import GridSearchCV
from sklearn.decomposition import PCA, KernelPCA, NMF, TruncatedSVD, FactorAnalysis, FastICA, SparsePCA, DictionaryLearning, IncrementalPCA
'''
--------------------------------------------------------
OTHER
--------------------------------------------------------
'''
import tensorflow as tf
from tensorflow import keras
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.metrics import mean_squared_error, r2_score, accuracy_score, roc_auc_score
import category_encoders as ce
from sklearn.pipeline import Pipeline
import statsmodels.api as sm
from statsmodels.stats.diagnostic import het_breuschpagan
from datetime import datetime
import seaborn as sns
import plotly.express as px
import pandas as pd
from functools import partial
import catboost as cb
from DataScienceMethods import Plots, Information
from category_encoders import MEstimateEncoder, GLMMEncoder, OrdinalEncoder, CatBoostEncoder
import shap
import warnings
from sklearn.cluster import KMeans
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis, QuadraticDiscriminantAnalysis
import optuna
from scipy import stats
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
import xgboost as xgb
def ignoreWarnings():
warnings.filterwarnings('ignore')
ignoreWarnings()
📊 Exploratory Data Analysis
¶training = pd.read_csv('Training.csv', delimiter=';')
testing = pd.read_csv('Testing.csv', delimiter=';')
#Unlock display limit
pd.set_option('display.max_columns', None)
numericColumns = ['ActualWeightFront','ActualWeightBack','ActualWeightTotal','WheelBase','Overhang']
numericColumnsNoTarget = ['WheelBase','Overhang']
targetColumns = ['ActualWeightTotal','ActualWeightFront','ActualWeightBack']
target_ActualWeightBack = training['ActualWeightBack']
target_ActualWeightFront = training['ActualWeightFront']
target_ActualWeightTotal = training['ActualWeightTotal']
Let's check to see if there are any null values
#Count null values
print(training.isnull().sum())
training.head()
TruckSID 0 ActualWeightFront 0 ActualWeightBack 0 ActualWeightTotal 0 Engine 0 Transmission 0 FrontAxlePosition 0 WheelBase 0 Overhang 0 FrameRails 0 Liner 0 FrontEndExt 0 Cab 0 RearAxels 0 RearSusp 0 FrontSusp 0 RearWheels 0 RearTires 0 FrontWheels 0 FrontTires 0 TagAxle 0 EngineFamily 0 TransmissionFamily 0 dtype: int64
| TruckSID | ActualWeightFront | ActualWeightBack | ActualWeightTotal | Engine | Transmission | FrontAxlePosition | WheelBase | Overhang | FrameRails | Liner | FrontEndExt | Cab | RearAxels | RearSusp | FrontSusp | RearWheels | RearTires | FrontWheels | FrontTires | TagAxle | EngineFamily | TransmissionFamily | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 31081 | 11280 | 8030 | 19310 | 1012011 | 2700028 | 3690005 | 249 | 104 | 403012 | 404002 | 4070004 | 5000002 | 330444 | 3500004 | 3700002 | 9140014 | 933469 | 9050015 | 930469 | 3P1998 | 101D100 | 270C25 |
| 1 | 30580 | 10720 | 6660 | 17380 | 1012011 | 2700022 | 3690005 | 183 | 68 | 403012 | 404002 | 4070004 | 5000004 | 330507 | 3500004 | 3700011 | 9142001 | 933469 | 9050031 | 930821 | 3P1998 | 101D100 | 270C24 |
| 2 | 31518 | 11040 | 6230 | 17270 | 1012001 | 2700022 | 3690005 | 216 | 68 | 403012 | 404002 | 4070004 | 5000001 | 330444 | 3500004 | 3700002 | 9140014 | 933062 | 9050015 | 930469 | 3P1998 | 101D97 | 270C24 |
| 3 | 31816 | 11210 | 7430 | 18640 | 1012002 | 2700028 | 3690005 | 219 | 104 | 403012 | 404002 | 4070004 | 5000002 | 330444 | 3500004 | 3700002 | 9140014 | 933062 | 9050015 | 930469 | 3P1998 | 101D97 | 270C25 |
| 4 | 30799 | 11910 | 7510 | 19420 | 1012019 | 2700028 | 3690005 | 231 | 104 | 403012 | 404002 | 4070004 | 5000001 | 330444 | 3500004 | 3700011 | 9142001 | 933469 | 9050037 | 930469 | 3P1998 | 101D102 | 270C25 |
print(testing.isnull().sum())
testing.head()
TruckSID 0 Engine 0 Transmission 0 FrontAxlePosition 0 WheelBase 0 Overhang 0 FrameRails 0 Liner 0 FrontEndExt 0 Cab 0 RearAxels 0 RearSusp 0 FrontSusp 0 RearWheels 0 RearTires 0 FrontWheels 0 FrontTires 0 TagAxle 0 EngineFamily 0 TransmissionFamily 0 dtype: int64
| TruckSID | Engine | Transmission | FrontAxlePosition | WheelBase | Overhang | FrameRails | Liner | FrontEndExt | Cab | RearAxels | RearSusp | FrontSusp | RearWheels | RearTires | FrontWheels | FrontTires | TagAxle | EngineFamily | TransmissionFamily | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 35433 | 1012003 | 2700028 | 3690005 | 207 | 98 | 403012 | 404998 | 4070004 | 5000001 | 330444 | 3500004 | 3700002 | 9140022 | 933062 | 9050037 | 930469 | 3P1998 | 101D97 | 270C25 |
| 1 | 31091 | 1012019 | 2700028 | 3690005 | 201 | 62 | 403011 | 404002 | 4070004 | 5000004 | 330507 | 3500004 | 3700011 | 9142001 | 933469 | 9050037 | 930469 | 3P1998 | 101D102 | 270C25 |
| 2 | 26771 | 1012002 | 2700022 | 3690005 | 213 | 62 | 403011 | 404002 | 4070004 | 5000001 | 330444 | 3500004 | 3700002 | 9142001 | 933469 | 9050031 | 930821 | 3P1998 | 101D97 | 270C24 |
| 3 | 29201 | 1012003 | 2700022 | 3690005 | 192 | 110 | 403012 | 404998 | 4070004 | 5000001 | 3300041 | 3500014 | 3700002 | 9140005 | 933062 | 905549 | 930821 | 3P1998 | 101D97 | 270C24 |
| 4 | 31083 | 1012011 | 2700028 | 3690005 | 249 | 104 | 403012 | 404002 | 4070004 | 5000002 | 330444 | 3500004 | 3700002 | 9140014 | 933469 | 9050015 | 930469 | 3P1998 | 101D100 | 270C25 |
There are no missing values in the training or testing dataset so we can then check to see which columns are missing from the training set
These will be our targets.
print(f'''
The shape of the training dataset is: {training.shape}
The shape of the testing dataset is: {testing.shape}
The training dataset is missing the following columns:
{set( training.columns ) - set( testing.columns )}
''')
The shape of the training dataset is: (2644, 23)
The shape of the testing dataset is: (962, 20)
The training dataset is missing the following columns:
{'ActualWeightBack', 'ActualWeightFront', 'ActualWeightTotal'}
This is a very small dataset so we can rule out deep learning.
training.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 2644 entries, 0 to 2643 Data columns (total 23 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 TruckSID 2644 non-null int64 1 ActualWeightFront 2644 non-null int64 2 ActualWeightBack 2644 non-null int64 3 ActualWeightTotal 2644 non-null int64 4 Engine 2644 non-null int64 5 Transmission 2644 non-null int64 6 FrontAxlePosition 2644 non-null int64 7 WheelBase 2644 non-null int64 8 Overhang 2644 non-null int64 9 FrameRails 2644 non-null int64 10 Liner 2644 non-null int64 11 FrontEndExt 2644 non-null int64 12 Cab 2644 non-null int64 13 RearAxels 2644 non-null int64 14 RearSusp 2644 non-null int64 15 FrontSusp 2644 non-null int64 16 RearWheels 2644 non-null int64 17 RearTires 2644 non-null int64 18 FrontWheels 2644 non-null int64 19 FrontTires 2644 non-null int64 20 TagAxle 2644 non-null object 21 EngineFamily 2644 non-null object 22 TransmissionFamily 2644 non-null object dtypes: int64(20), object(3) memory usage: 475.2+ KB
testing.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 962 entries, 0 to 961 Data columns (total 20 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 TruckSID 962 non-null int64 1 Engine 962 non-null int64 2 Transmission 962 non-null int64 3 FrontAxlePosition 962 non-null int64 4 WheelBase 962 non-null int64 5 Overhang 962 non-null int64 6 FrameRails 962 non-null int64 7 Liner 962 non-null int64 8 FrontEndExt 962 non-null int64 9 Cab 962 non-null object 10 RearAxels 962 non-null int64 11 RearSusp 962 non-null int64 12 FrontSusp 962 non-null object 13 RearWheels 962 non-null int64 14 RearTires 962 non-null int64 15 FrontWheels 962 non-null int64 16 FrontTires 962 non-null int64 17 TagAxle 962 non-null object 18 EngineFamily 962 non-null object 19 TransmissionFamily 962 non-null object dtypes: int64(15), object(5) memory usage: 150.4+ KB
'''
Take only the numeric columns for analysis
'''
training_numeric = training[numericColumns]
testing_numeric = testing[numericColumnsNoTarget]
training_numeric
| ActualWeightFront | ActualWeightBack | ActualWeightTotal | WheelBase | Overhang | |
|---|---|---|---|---|---|
| 0 | 11280 | 8030 | 19310 | 249 | 104 |
| 1 | 10720 | 6660 | 17380 | 183 | 68 |
| 2 | 11040 | 6230 | 17270 | 216 | 68 |
| 3 | 11210 | 7430 | 18640 | 219 | 104 |
| 4 | 11910 | 7510 | 19420 | 231 | 104 |
| ... | ... | ... | ... | ... | ... |
| 2639 | 10110 | 9830 | 19940 | 210 | 104 |
| 2640 | 11150 | 6700 | 17850 | 210 | 74 |
| 2641 | 10850 | 7020 | 17870 | 222 | 80 |
| 2642 | 10380 | 6850 | 17230 | 222 | 56 |
| 2643 | 9820 | 8760 | 18580 | 198 | 104 |
2644 rows × 5 columns
training.nunique()
TruckSID 2640 ActualWeightFront 350 ActualWeightBack 441 ActualWeightTotal 520 Engine 12 Transmission 5 FrontAxlePosition 2 WheelBase 35 Overhang 13 FrameRails 3 Liner 3 FrontEndExt 2 Cab 5 RearAxels 4 RearSusp 4 FrontSusp 4 RearWheels 10 RearTires 5 FrontWheels 11 FrontTires 4 TagAxle 9 EngineFamily 9 TransmissionFamily 4 dtype: int64
There seem to be a lot of categorical columns which are considered ints
dataframeWithoutTarget = training.drop(columns=targetColumns)
#Get all the object columns
obj_cols_ = [col for col in training.columns if col not in numericColumns]
training[obj_cols_] = training[obj_cols_].astype('string')
testing[obj_cols_] = testing[obj_cols_].astype('string')
training.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 2644 entries, 0 to 2643 Data columns (total 23 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 TruckSID 2644 non-null string 1 ActualWeightFront 2644 non-null int64 2 ActualWeightBack 2644 non-null int64 3 ActualWeightTotal 2644 non-null int64 4 Engine 2644 non-null string 5 Transmission 2644 non-null string 6 FrontAxlePosition 2644 non-null string 7 WheelBase 2644 non-null int64 8 Overhang 2644 non-null int64 9 FrameRails 2644 non-null string 10 Liner 2644 non-null string 11 FrontEndExt 2644 non-null string 12 Cab 2644 non-null string 13 RearAxels 2644 non-null string 14 RearSusp 2644 non-null string 15 FrontSusp 2644 non-null string 16 RearWheels 2644 non-null string 17 RearTires 2644 non-null string 18 FrontWheels 2644 non-null string 19 FrontTires 2644 non-null string 20 TagAxle 2644 non-null string 21 EngineFamily 2644 non-null string 22 TransmissionFamily 2644 non-null string dtypes: int64(5), string(18) memory usage: 475.2 KB
Cab and FrontSusp are objects in the testing data but ints in the training data.
This implies that the testing data might be dirty or hold values that do not exist in the training data.
cols_in_training = training.columns.to_list()
#Exclude the TruckSID column
cols_in_testing = testing.columns.to_list()[1:]
#Get all the columns that are not numeric
cols_in_training = [item for item in cols_in_training if item not in numericColumns]
cols_in_testing = [item for item in cols_in_testing if item not in numericColumns]
for col in cols_in_testing:
training_values = set(training[col].unique())
testing_values = set(testing[col].unique())
# Find values in testing that are not in training
diff_values = testing_values - training_values
# Print out the different values
for val in diff_values:
print(f"'{val}' from '{col}' is not in the training set")
'2700023' from 'Transmission' is not in the training set '500QXX' from 'Cab' is not in the training set '330545' from 'RearAxels' is not in the training set '370QXX' from 'FrontSusp' is not in the training set '9140019' from 'RearWheels' is not in the training set '914105' from 'RearWheels' is not in the training set
There appear to be some new categories which look very suspicious. Some might be new but others are probably misspellings.
When we encode the category variables I will use
training['EngineFamily'].unique()
<StringArray> [ '101D100', '101D97', '101D102', '101D97 ', '101D97.', '101D56', '101D69', '101.D100', '101D67'] Length: 9, dtype: string
There seem to be some '.' in the data, these appear to be mistakes so we will quickly get rid of them.
all_cols = training.columns
for col in all_cols:
try:
training[col] = training[col].str.replace(' ', '')
except:
pass
training['EngineFamily'] = training['EngineFamily'].str.replace('.', '')
training['EngineFamily'].unique()
<StringArray> ['101D100', '101D97', '101D102', '101D56', '101D69', '101D67'] Length: 6, dtype: string
All anomalies have been dealt with.
Let's continue looking at the data
from DataScienceMethods import Information, Plots
Information.summary(training_numeric)
data shape: (2644, 5)
| data type | #missing | %missing | #unique | min | max | first value | second value | third value | |
|---|---|---|---|---|---|---|---|---|---|
| ActualWeightFront | int64 | 0 | 0.0 | 350 | 7801.0 | 12890.0 | 11280 | 10720 | 11040 |
| ActualWeightBack | int64 | 0 | 0.0 | 441 | 4650.0 | 10030.0 | 8030 | 6660 | 6230 |
| ActualWeightTotal | int64 | 0 | 0.0 | 520 | 15721.0 | 20640.0 | 19310 | 17380 | 17270 |
| WheelBase | int64 | 0 | 0.0 | 35 | 162.0 | 285.0 | 249 | 183 | 216 |
| Overhang | int64 | 0 | 0.0 | 13 | 56.0 | 3618545.0 | 104 | 68 | 68 |
3 million seems a bit excessive for an overhang
Information.summary(testing_numeric)
data shape: (962, 2)
| data type | #missing | %missing | #unique | min | max | first value | second value | third value | |
|---|---|---|---|---|---|---|---|---|---|
| WheelBase | int64 | 0 | 0.0 | 42 | 150.0 | 285.0 | 207 | 201 | 213 |
| Overhang | int64 | 0 | 0.0 | 14 | 56.0 | 3618545.0 | 98 | 62 | 62 |
sns.boxplot(y=training_numeric['Overhang'])
plt.title('Box and Whisker Plot')
plt.show()
The testing data also has the large outlier.
This means this might be a common occurence in the data.
So instead of dropping these outliers, let's rather convert them to the average (That is present without the outliers) which is 90
'''
Replace all outliers with a given value (90)
'''
def replaceOutliers(data, variable, multiplier=1.5, replacement_value=90):
Q1 = data[variable].quantile(0.25) #Specify the first quartile
Q3 = data[variable].quantile(0.75) #Specify the third quartile
IQR = Q3 - Q1 #Interquartile range
lower_bound = Q1 - multiplier*IQR #Specify the lower bound
upper_bound = Q3 + multiplier*IQR #Specify the upper bound
print('Lower bound: ', lower_bound)
print('Upper bound: ', upper_bound)
# Replacing outliers with the given value
data.loc[data[variable] < lower_bound, variable] = replacement_value
data.loc[data[variable] > upper_bound, variable] = replacement_value
return data
training_numeric = replaceOutliers(training_numeric, 'Overhang', 10)
Information.summary(training_numeric)
Lower bound: -160.0 Upper bound: 344.0 data shape: (2644, 5)
| data type | #missing | %missing | #unique | min | max | first value | second value | third value | |
|---|---|---|---|---|---|---|---|---|---|
| ActualWeightFront | int64 | 0 | 0.0 | 350 | 7801.0 | 12890.0 | 11280 | 10720 | 11040 |
| ActualWeightBack | int64 | 0 | 0.0 | 441 | 4650.0 | 10030.0 | 8030 | 6660 | 6230 |
| ActualWeightTotal | int64 | 0 | 0.0 | 520 | 15721.0 | 20640.0 | 19310 | 17380 | 17270 |
| WheelBase | int64 | 0 | 0.0 | 35 | 162.0 | 285.0 | 249 | 183 | 216 |
| Overhang | int64 | 0 | 0.0 | 12 | 56.0 | 128.0 | 104 | 68 | 68 |
variables = training_numeric.columns
for col in variables:
plt.figure(figsize=(10, 6))
# Histogram
sns.histplot(training_numeric[col], bins=30, kde=True) # kde=True will also plot the kernel density estimate
plt.title(f'Distribution of {col}')
plt.xlabel(col)
plt.ylabel('Frequency')
plt.grid(True)
plt.show()
The targets follow a normal distribution while the other two variables don't really.
training = replaceOutliers(training,'Overhang', 10)
testing = replaceOutliers(testing,'Overhang', 10)
Information.summary(training)
Lower bound: -160.0 Upper bound: 344.0 Lower bound: -160.0 Upper bound: 344.0 data shape: (2644, 23)
| data type | #missing | %missing | #unique | min | max | first value | second value | third value | |
|---|---|---|---|---|---|---|---|---|---|
| TruckSID | string | 0 | 0.0 | 2640 | NaN | NaN | 31081 | 30580 | 31518 |
| ActualWeightFront | int64 | 0 | 0.0 | 350 | 7801.0 | 12890.0 | 11280 | 10720 | 11040 |
| ActualWeightBack | int64 | 0 | 0.0 | 441 | 4650.0 | 10030.0 | 8030 | 6660 | 6230 |
| ActualWeightTotal | int64 | 0 | 0.0 | 520 | 15721.0 | 20640.0 | 19310 | 17380 | 17270 |
| Engine | string | 0 | 0.0 | 12 | NaN | NaN | 1012011 | 1012011 | 1012001 |
| Transmission | string | 0 | 0.0 | 5 | NaN | NaN | 2700028 | 2700022 | 2700022 |
| FrontAxlePosition | string | 0 | 0.0 | 2 | NaN | NaN | 3690005 | 3690005 | 3690005 |
| WheelBase | int64 | 0 | 0.0 | 35 | 162.0 | 285.0 | 249 | 183 | 216 |
| Overhang | int64 | 0 | 0.0 | 12 | 56.0 | 128.0 | 104 | 68 | 68 |
| FrameRails | string | 0 | 0.0 | 3 | NaN | NaN | 403012 | 403012 | 403012 |
| Liner | string | 0 | 0.0 | 3 | NaN | NaN | 404002 | 404002 | 404002 |
| FrontEndExt | string | 0 | 0.0 | 2 | NaN | NaN | 4070004 | 4070004 | 4070004 |
| Cab | string | 0 | 0.0 | 5 | NaN | NaN | 5000002 | 5000004 | 5000001 |
| RearAxels | string | 0 | 0.0 | 4 | NaN | NaN | 330444 | 330507 | 330444 |
| RearSusp | string | 0 | 0.0 | 4 | NaN | NaN | 3500004 | 3500004 | 3500004 |
| FrontSusp | string | 0 | 0.0 | 4 | NaN | NaN | 3700002 | 3700011 | 3700002 |
| RearWheels | string | 0 | 0.0 | 10 | NaN | NaN | 9140014 | 9142001 | 9140014 |
| RearTires | string | 0 | 0.0 | 5 | NaN | NaN | 933469 | 933469 | 933062 |
| FrontWheels | string | 0 | 0.0 | 11 | NaN | NaN | 9050015 | 9050031 | 9050015 |
| FrontTires | string | 0 | 0.0 | 4 | NaN | NaN | 930469 | 930821 | 930469 |
| TagAxle | string | 0 | 0.0 | 7 | NaN | NaN | 3P1998 | 3P1998 | 3P1998 |
| EngineFamily | string | 0 | 0.0 | 6 | NaN | NaN | 101D100 | 101D100 | 101D97 |
| TransmissionFamily | string | 0 | 0.0 | 2 | NaN | NaN | 270C25 | 270C24 | 270C24 |
Information.summary(testing)
data shape: (962, 20)
| data type | #missing | %missing | #unique | min | max | first value | second value | third value | |
|---|---|---|---|---|---|---|---|---|---|
| TruckSID | string | 0 | 0.0 | 962 | NaN | NaN | 35433 | 31091 | 26771 |
| Engine | string | 0 | 0.0 | 12 | NaN | NaN | 1012003 | 1012019 | 1012002 |
| Transmission | string | 0 | 0.0 | 6 | NaN | NaN | 2700028 | 2700028 | 2700022 |
| FrontAxlePosition | string | 0 | 0.0 | 2 | NaN | NaN | 3690005 | 3690005 | 3690005 |
| WheelBase | int64 | 0 | 0.0 | 42 | 150.0 | 285.0 | 207 | 201 | 213 |
| Overhang | int64 | 0 | 0.0 | 13 | 56.0 | 128.0 | 98 | 62 | 62 |
| FrameRails | string | 0 | 0.0 | 3 | NaN | NaN | 403012 | 403011 | 403011 |
| Liner | string | 0 | 0.0 | 3 | NaN | NaN | 404998 | 404002 | 404002 |
| FrontEndExt | string | 0 | 0.0 | 2 | NaN | NaN | 4070004 | 4070004 | 4070004 |
| Cab | string | 0 | 0.0 | 6 | NaN | NaN | 5000001 | 5000004 | 5000001 |
| RearAxels | string | 0 | 0.0 | 5 | NaN | NaN | 330444 | 330507 | 330444 |
| RearSusp | string | 0 | 0.0 | 4 | NaN | NaN | 3500004 | 3500004 | 3500004 |
| FrontSusp | string | 0 | 0.0 | 5 | NaN | NaN | 3700002 | 3700011 | 3700002 |
| RearWheels | string | 0 | 0.0 | 12 | NaN | NaN | 9140022 | 9142001 | 9142001 |
| RearTires | string | 0 | 0.0 | 5 | NaN | NaN | 933062 | 933469 | 933469 |
| FrontWheels | string | 0 | 0.0 | 11 | NaN | NaN | 9050037 | 9050037 | 9050031 |
| FrontTires | string | 0 | 0.0 | 4 | NaN | NaN | 930469 | 930469 | 930821 |
| TagAxle | string | 0 | 0.0 | 6 | NaN | NaN | 3P1998 | 3P1998 | 3P1998 |
| EngineFamily | string | 0 | 0.0 | 6 | NaN | NaN | 101D97 | 101D102 | 101D97 |
| TransmissionFamily | string | 0 | 0.0 | 2 | NaN | NaN | 270C25 | 270C25 | 270C24 |
sns.boxplot(y=training_numeric['Overhang'])
plt.title('Box and Whisker Plot')
plt.show()
The data is looking much better.
But we will need to keep in mind that we want to deal with the outliers in the testing data at some point.
training.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 2644 entries, 0 to 2643 Data columns (total 23 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 TruckSID 2644 non-null string 1 ActualWeightFront 2644 non-null int64 2 ActualWeightBack 2644 non-null int64 3 ActualWeightTotal 2644 non-null int64 4 Engine 2644 non-null string 5 Transmission 2644 non-null string 6 FrontAxlePosition 2644 non-null string 7 WheelBase 2644 non-null int64 8 Overhang 2644 non-null int64 9 FrameRails 2644 non-null string 10 Liner 2644 non-null string 11 FrontEndExt 2644 non-null string 12 Cab 2644 non-null string 13 RearAxels 2644 non-null string 14 RearSusp 2644 non-null string 15 FrontSusp 2644 non-null string 16 RearWheels 2644 non-null string 17 RearTires 2644 non-null string 18 FrontWheels 2644 non-null string 19 FrontTires 2644 non-null string 20 TagAxle 2644 non-null string 21 EngineFamily 2644 non-null string 22 TransmissionFamily 2644 non-null string dtypes: int64(5), string(18) memory usage: 475.2 KB
training.to_csv('training_super_clean2.csv', index=False)
Let's check the correlations between the numeric features and the target variables.
# Create a mask for the upper triangle.
mask = np.triu(np.ones_like(training_numeric.corr(), dtype=bool))
# Set up the matplotlib figure size.
sns.set(rc={'figure.figsize':(20,10)})
# Draw the heatmap with the mask.
sns.heatmap(training_numeric.corr(), annot=True, mask=mask)
<AxesSubplot:>
We see a slight positive correlation between Back and total from overhang, but no variables stand out as heroes.
Let's check for Heteroskedasticity
def checkForHeteroskedasticity(dependentNames,dependentVariables,independentNames,independentVariables):
for y_name, y in zip(dependentNames, dependentVariables):
for X_name, X in zip(independentNames, independentVariables):
X = sm.add_constant(X)
print(f'{y_name} - {X_name}')
# Fit regression model
model = sm.OLS(y, X).fit()
# Perform Breusch-Pagan test
bp_test = het_breuschpagan(model.resid, model.model.exog)
labels = ['LM Statistic', 'LM-Test p-value', 'F-Statistic', 'F-Test p-value']
results = dict(zip(labels, bp_test))
print(results)
alpha = 0.05
if results['LM-Test p-value'] < alpha or results['F-Test p-value'] < alpha:
print("Reject the null hypothesis: Evidence of heteroskedasticity.")
else:
print("Fail to reject the null hypothesis: No evidence of heteroskedasticity.")
print('')
target_ActualWeightTotal = training_numeric['ActualWeightTotal']
target_ActualWeightFront = training_numeric['ActualWeightFront']
target_ActualWeightBack = training_numeric['ActualWeightBack']
X_WheelBase = training_numeric['WheelBase']
X_Overhang = training_numeric['Overhang']
X_All = training_numeric.drop(['ActualWeightTotal', 'ActualWeightFront', 'ActualWeightBack'], axis=1)
dependentVariables = [target_ActualWeightTotal, target_ActualWeightFront, target_ActualWeightBack]
independentVariables = [X_WheelBase, X_Overhang, X_All]
dependentNames = ['ActualWeightTotal', 'ActualWeightFront', 'ActualWeightBack']
independentNames = ['WheelBase', 'Overhang', 'All']
checkForHeteroskedasticity(dependentNames,dependentVariables,independentNames,independentVariables)
ActualWeightTotal - WheelBase
{'LM Statistic': 8.762936123953516, 'LM-Test p-value': 0.003074137501122203, 'F-Statistic': 8.785424870061815, 'F-Test p-value': 0.0030636132978057976}
Reject the null hypothesis: Evidence of heteroskedasticity.
ActualWeightTotal - Overhang
{'LM Statistic': 3.6970266508277803, 'LM-Test p-value': 0.054509523567570786, 'F-Statistic': 3.6994028753816743, 'F-Test p-value': 0.054539291785276965}
Fail to reject the null hypothesis: No evidence of heteroskedasticity.
ActualWeightTotal - All
{'LM Statistic': 11.745150415183469, 'LM-Test p-value': 0.0028156132013616875, 'F-Statistic': 5.892085686800411, 'F-Test p-value': 0.0027976391354201393}
Reject the null hypothesis: Evidence of heteroskedasticity.
ActualWeightFront - WheelBase
{'LM Statistic': 32.94340347638752, 'LM-Test p-value': 9.488113756834247e-09, 'F-Statistic': 33.333812871194404, 'F-Test p-value': 8.668664400292033e-09}
Reject the null hypothesis: Evidence of heteroskedasticity.
ActualWeightFront - Overhang
{'LM Statistic': 54.98825995311089, 'LM-Test p-value': 1.21251928235889e-13, 'F-Statistic': 56.11368251017964, 'F-Test p-value': 9.27204061053086e-14}
Reject the null hypothesis: Evidence of heteroskedasticity.
ActualWeightFront - All
{'LM Statistic': 49.82460261088168, 'LM-Test p-value': 1.516090098971871e-11, 'F-Statistic': 25.361965815374816, 'F-Test p-value': 1.2299224575118543e-11}
Reject the null hypothesis: Evidence of heteroskedasticity.
ActualWeightBack - WheelBase
{'LM Statistic': 68.88135782553877, 'LM-Test p-value': 1.0456862398888746e-16, 'F-Statistic': 70.67035452059922, 'F-Test p-value': 6.805439212303132e-17}
Reject the null hypothesis: Evidence of heteroskedasticity.
ActualWeightBack - Overhang
{'LM Statistic': 140.58584127260607, 'LM-Test p-value': 1.9820090908421064e-32, 'F-Statistic': 148.36849561921454, 'F-Test p-value': 3.0167609854688856e-33}
Reject the null hypothesis: Evidence of heteroskedasticity.
ActualWeightBack - All
{'LM Statistic': 134.89977627214296, 'LM-Test p-value': 5.091969310887176e-30, 'F-Statistic': 70.9956313752597, 'F-Test p-value': 9.274388741601217e-31}
Reject the null hypothesis: Evidence of heteroskedasticity.
However, when "Overhang" is used to predict either "ActualWeightFront" or "ActualWeightBack", there is evidence that the variance of the model's errors is not constant. This can be problematic as it can lead to unreliable significance tests and confidence intervals.
def threeDScatter(df, category,X,Y,Z):
# Create a color palette from Seaborn
unique_categories = df[category].unique()
palette = dict(zip(unique_categories, sns.color_palette("husl", len(unique_categories))))
colors = df[category].map(palette)
# Set the Seaborn style
sns.set_style("whitegrid")
fig = plt.figure(figsize=(12, 12))
ax = fig.add_subplot(111, projection='3d')
scatter = ax.scatter(df[X], df[Y], df[Z], c=colors, s=60, edgecolors='w', depthshade=True)
# Setting the labels for the axes
ax.set_title(f'{X} vs {Y} vs {Z} for {category}')
ax.set_xlabel(X)
ax.set_ylabel(Y)
ax.set_zlabel(Z)
# Add legend
from matplotlib.lines import Line2D
legend_elements = [Line2D([0], [0], marker='o', color='w', markerfacecolor=palette[key], markersize=10, label=key) for key in palette]
ax.legend(handles=legend_elements, loc='upper right')
plt.show()
threeDScatter(training,'EngineFamily','ActualWeightFront','ActualWeightBack','ActualWeightTotal')
threeDScatter(training,'Engine','ActualWeightFront','ActualWeightBack','ActualWeightTotal')
threeDScatter(training,'Transmission','ActualWeightFront','ActualWeightBack','ActualWeightTotal')
threeDScatter(training,'TransmissionFamily','ActualWeightFront','ActualWeightBack','ActualWeightTotal')
All three features seem to show decent grouping within the data.
🔨 Data Preparation
¶Let's define our X and y value
X = training.drop(columns= targetColumns + ['TruckSID'])
y = target_ActualWeightFront
X
| Engine | Transmission | FrontAxlePosition | WheelBase | Overhang | FrameRails | Liner | FrontEndExt | Cab | RearAxels | RearSusp | FrontSusp | RearWheels | RearTires | FrontWheels | FrontTires | TagAxle | EngineFamily | TransmissionFamily | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1012011 | 2700028 | 3690005 | 249 | 104 | 403012 | 404002 | 4070004 | 5000002 | 330444 | 3500004 | 3700002 | 9140014 | 933469 | 9050015 | 930469 | 3P1998 | 101D100 | 270C25 |
| 1 | 1012011 | 2700022 | 3690005 | 183 | 68 | 403012 | 404002 | 4070004 | 5000004 | 330507 | 3500004 | 3700011 | 9142001 | 933469 | 9050031 | 930821 | 3P1998 | 101D100 | 270C24 |
| 2 | 1012001 | 2700022 | 3690005 | 216 | 68 | 403012 | 404002 | 4070004 | 5000001 | 330444 | 3500004 | 3700002 | 9140014 | 933062 | 9050015 | 930469 | 3P1998 | 101D97 | 270C24 |
| 3 | 1012002 | 2700028 | 3690005 | 219 | 104 | 403012 | 404002 | 4070004 | 5000002 | 330444 | 3500004 | 3700002 | 9140014 | 933062 | 9050015 | 930469 | 3P1998 | 101D97 | 270C25 |
| 4 | 1012019 | 2700028 | 3690005 | 231 | 104 | 403012 | 404002 | 4070004 | 5000001 | 330444 | 3500004 | 3700011 | 9142001 | 933469 | 9050037 | 930469 | 3P1998 | 101D102 | 270C25 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 2639 | 1012012 | 2700024 | 3690005 | 210 | 104 | 403012 | 404998 | 4070004 | 5000002 | 3300041 | 3500003 | 3700002 | 9140016 | 933469 | 9050015 | 930469 | 3P1998 | 101D100 | 270C24 |
| 2640 | 1012002 | 2700028 | 3690005 | 210 | 74 | 403012 | 404002 | 4070004 | 5000003 | 330444 | 3500004 | 3700002 | 9140014 | 933062 | 9050015 | 930469 | 3P1998 | 101D97 | 270C25 |
| 2641 | 1012002 | 2700028 | 3690005 | 222 | 80 | 403012 | 404998 | 4070004 | 5000002 | 330444 | 3500014 | 3700002 | 9140014 | 933062 | 9050015 | 930469 | 3P1998 | 101D97 | 270C25 |
| 2642 | 1012011 | 2700022 | 3690005 | 222 | 56 | 403012 | 404002 | 4070004 | 5000001 | 330444 | 3500004 | 3700002 | 9142003 | 933469 | 9052003 | 930469 | 3P1998 | 101D100 | 270C24 |
| 2643 | 1012011 | 2700022 | 3690005 | 198 | 104 | 403012 | 404998 | 4070004 | 5000002 | 330507 | 3500004 | 3700002 | 9142001 | 933469 | 9050037 | 930469 | 3P1998 | 101D100 | 270C24 |
2644 rows × 19 columns
Let's define a pipeline to process the data
# Custom transformer to convert string columns to category
class StringToCategory(BaseEstimator, TransformerMixin):
def fit(self, X, y=None):
return self
def transform(self, X):
string_cols = X.select_dtypes(include=['string']).columns
X[string_cols] = X[string_cols].astype('category')
return X
#Scale the data
class DataFrameScaler(BaseEstimator, TransformerMixin):
def __init__(self):
self.scaler = StandardScaler()
self.columns = None
def fit(self, X, y=None):
self.scaler.fit(X, y)
self.columns = X.columns
return self
def transform(self, X):
X_scaled = self.scaler.transform(X)
return pd.DataFrame(X_scaled, columns=self.columns)
# Create the pipeline
data_prep_pipeline = Pipeline([
('str_to_cat', StringToCategory()),
('target_encode', ce.TargetEncoder()),
('scale', DataFrameScaler())
])
X_transformed = data_prep_pipeline.fit_transform(X, y)
X_transformed
| Engine | Transmission | FrontAxlePosition | WheelBase | Overhang | FrameRails | Liner | FrontEndExt | Cab | RearAxels | RearSusp | FrontSusp | RearWheels | RearTires | FrontWheels | FrontTires | TagAxle | EngineFamily | TransmissionFamily | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | -0.775150 | 1.199613 | 0.085077 | 2.620206 | 0.867729 | 0.647233 | 1.118745 | -0.061616 | -0.946623 | 0.928864 | 0.251651 | 0.402826 | 1.262227 | -0.240839 | 1.224191 | -0.359090 | -0.15539 | -0.774318 | 1.059240 |
| 1 | -0.775150 | -0.917003 | 0.085077 | -1.545882 | -1.391651 | 0.647233 | 1.118745 | -0.061616 | 1.193472 | -0.427545 | 0.251651 | -0.687423 | -0.317981 | -0.240839 | 0.801290 | 1.816748 | -0.15539 | -0.774318 | -0.944073 |
| 2 | 0.194674 | -0.917003 | 0.085077 | 0.537162 | -1.391651 | 0.647233 | 1.118745 | -0.061616 | -0.408809 | 0.928864 | 0.251651 | 0.402826 | 1.262227 | 0.944712 | 1.224191 | -0.359090 | -0.15539 | 0.283715 | -0.944073 |
| 3 | 0.270655 | 1.199613 | 0.085077 | 0.726530 | 0.867729 | 0.647233 | 1.118745 | -0.061616 | -0.946623 | 0.928864 | 0.251651 | 0.402826 | 1.262227 | 0.944712 | 1.224191 | -0.359090 | -0.15539 | 0.283715 | 1.059240 |
| 4 | 1.580971 | 1.199613 | 0.085077 | 1.484000 | 0.867729 | 0.647233 | 1.118745 | -0.061616 | -0.408809 | 0.928864 | 0.251651 | -0.687423 | -0.317981 | -0.240839 | -0.856580 | -0.359090 | -0.15539 | 1.551007 | 1.059240 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 2639 | -0.175652 | -0.231394 | 0.085077 | 0.158427 | 0.867729 | 0.647233 | -0.906376 | -0.061616 | -0.946623 | -1.712860 | -3.511279 | 0.402826 | 1.691276 | -0.240839 | 1.224191 | -0.359090 | -0.15539 | -0.774318 | -0.944073 |
| 2640 | 0.270655 | 1.199613 | 0.085077 | 0.158427 | -1.015088 | 0.647233 | 1.118745 | -0.061616 | 2.089485 | 0.928864 | 0.251651 | 0.402826 | 1.262227 | 0.944712 | 1.224191 | -0.359090 | -0.15539 | 0.283715 | 1.059240 |
| 2641 | 0.270655 | 1.199613 | 0.085077 | 0.915898 | -0.638524 | 0.647233 | -0.906376 | -0.061616 | -0.946623 | 0.928864 | 0.517296 | 0.402826 | 1.262227 | 0.944712 | 1.224191 | -0.359090 | -0.15539 | 0.283715 | 1.059240 |
| 2642 | -0.775150 | -0.917003 | 0.085077 | 0.915898 | -2.144777 | 0.647233 | 1.118745 | -0.061616 | -0.408809 | 0.928864 | 0.251651 | 0.402826 | -0.721544 | -0.240839 | -1.037440 | -0.359090 | -0.15539 | -0.774318 | -0.944073 |
| 2643 | -0.775150 | -0.917003 | 0.085077 | -0.599043 | 0.867729 | 0.647233 | -0.906376 | -0.061616 | -0.946623 | -0.427545 | 0.251651 | 0.402826 | -0.317981 | -0.240839 | -0.856580 | -0.359090 | -0.15539 | -0.774318 | -0.944073 |
2644 rows × 19 columns
newData = pd.concat([X_transformed, target_ActualWeightBack], axis=1)
newData = pd.concat([newData, target_ActualWeightFront], axis=1)
newData = pd.concat([newData, target_ActualWeightTotal], axis=1)
newData
| Engine | Transmission | FrontAxlePosition | WheelBase | Overhang | FrameRails | Liner | FrontEndExt | Cab | RearAxels | RearSusp | FrontSusp | RearWheels | RearTires | FrontWheels | FrontTires | TagAxle | EngineFamily | TransmissionFamily | ActualWeightBack | ActualWeightFront | ActualWeightTotal | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | -0.775150 | 1.199613 | 0.085077 | 2.620206 | 0.867729 | 0.647233 | 1.118745 | -0.061616 | -0.946623 | 0.928864 | 0.251651 | 0.402826 | 1.262227 | -0.240839 | 1.224191 | -0.359090 | -0.15539 | -0.774318 | 1.059240 | 8030 | 11280 | 19310 |
| 1 | -0.775150 | -0.917003 | 0.085077 | -1.545882 | -1.391651 | 0.647233 | 1.118745 | -0.061616 | 1.193472 | -0.427545 | 0.251651 | -0.687423 | -0.317981 | -0.240839 | 0.801290 | 1.816748 | -0.15539 | -0.774318 | -0.944073 | 6660 | 10720 | 17380 |
| 2 | 0.194674 | -0.917003 | 0.085077 | 0.537162 | -1.391651 | 0.647233 | 1.118745 | -0.061616 | -0.408809 | 0.928864 | 0.251651 | 0.402826 | 1.262227 | 0.944712 | 1.224191 | -0.359090 | -0.15539 | 0.283715 | -0.944073 | 6230 | 11040 | 17270 |
| 3 | 0.270655 | 1.199613 | 0.085077 | 0.726530 | 0.867729 | 0.647233 | 1.118745 | -0.061616 | -0.946623 | 0.928864 | 0.251651 | 0.402826 | 1.262227 | 0.944712 | 1.224191 | -0.359090 | -0.15539 | 0.283715 | 1.059240 | 7430 | 11210 | 18640 |
| 4 | 1.580971 | 1.199613 | 0.085077 | 1.484000 | 0.867729 | 0.647233 | 1.118745 | -0.061616 | -0.408809 | 0.928864 | 0.251651 | -0.687423 | -0.317981 | -0.240839 | -0.856580 | -0.359090 | -0.15539 | 1.551007 | 1.059240 | 7510 | 11910 | 19420 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 2639 | -0.175652 | -0.231394 | 0.085077 | 0.158427 | 0.867729 | 0.647233 | -0.906376 | -0.061616 | -0.946623 | -1.712860 | -3.511279 | 0.402826 | 1.691276 | -0.240839 | 1.224191 | -0.359090 | -0.15539 | -0.774318 | -0.944073 | 9830 | 10110 | 19940 |
| 2640 | 0.270655 | 1.199613 | 0.085077 | 0.158427 | -1.015088 | 0.647233 | 1.118745 | -0.061616 | 2.089485 | 0.928864 | 0.251651 | 0.402826 | 1.262227 | 0.944712 | 1.224191 | -0.359090 | -0.15539 | 0.283715 | 1.059240 | 6700 | 11150 | 17850 |
| 2641 | 0.270655 | 1.199613 | 0.085077 | 0.915898 | -0.638524 | 0.647233 | -0.906376 | -0.061616 | -0.946623 | 0.928864 | 0.517296 | 0.402826 | 1.262227 | 0.944712 | 1.224191 | -0.359090 | -0.15539 | 0.283715 | 1.059240 | 7020 | 10850 | 17870 |
| 2642 | -0.775150 | -0.917003 | 0.085077 | 0.915898 | -2.144777 | 0.647233 | 1.118745 | -0.061616 | -0.408809 | 0.928864 | 0.251651 | 0.402826 | -0.721544 | -0.240839 | -1.037440 | -0.359090 | -0.15539 | -0.774318 | -0.944073 | 6850 | 10380 | 17230 |
| 2643 | -0.775150 | -0.917003 | 0.085077 | -0.599043 | 0.867729 | 0.647233 | -0.906376 | -0.061616 | -0.946623 | -0.427545 | 0.251651 | 0.402826 | -0.317981 | -0.240839 | -0.856580 | -0.359090 | -0.15539 | -0.774318 | -0.944073 | 8760 | 9820 | 18580 |
2644 rows × 22 columns
mask = np.triu(np.ones_like(newData.corr(), dtype=bool))
# Set up the matplotlib figure size.
sns.set(rc={'figure.figsize':(20,10)})
# Draw the heatmap with the mask.
sns.heatmap(newData.corr(), annot=True, mask=mask)
<AxesSubplot:>
🏆 Model Selection
¶Let's see if we can get a good idea of how many estimators we should use as a reference.
X_train, X_test, y_train, y_test = train_test_split(X_transformed, target_ActualWeightFront, test_size=0.3, random_state=42)
# Number of trees to evaluate
n_trees = list(range(10, 201, 10))
oob_errors = []
for n in n_trees:
model = RandomForestRegressor(n_estimators=n, oob_score=True, random_state=42, n_jobs=-1)
model.fit(X_train, y_train)
oob_errors.append(1 - model.oob_score_)
plt.figure(figsize=(12, 6))
plt.plot(n_trees, oob_errors, '-o')
plt.title('OOB Error Across Different Numbers of Trees')
plt.xlabel('Number of Trees')
plt.ylabel('OOB Error')
plt.show()
When testing estimators let's use 30 as a point of reference.
from sklearn.ensemble import AdaBoostRegressor, BaggingRegressor, ExtraTreesRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.neighbors import KNeighborsRegressor
from sklearn.neural_network import MLPRegressor
from sklearn.linear_model import ElasticNet, BayesianRidge, ARDRegression, PassiveAggressiveRegressor, HuberRegressor, TheilSenRegressor, RANSACRegressor
from sklearn.svm import LinearSVR
from sklearn.kernel_ridge import KernelRidge
import pandas as pd
from sklearn.model_selection import KFold
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.linear_model import LinearRegression, Ridge, Lasso
from sklearn.svm import SVR
import numpy as np
#Add medals to the best scoring algorithms
def append_medals(df):
# For R2, higher is better
top3_r2 = df['R2 Score'].nlargest(3).index
df.at[top3_r2[0], 'R2 Score'] = str(df['R2 Score'].iloc[top3_r2[0]]) + ' 🏆'
df.at[top3_r2[1], 'R2 Score'] = str(df['R2 Score'].iloc[top3_r2[1]]) + ' 🥈'
df.at[top3_r2[2], 'R2 Score'] = str(df['R2 Score'].iloc[top3_r2[2]]) + ' 🥉'
# For the rest, lower is better
metrics = ['RMSE', 'MSE', 'MAE', 'MAPE', 'SMAPE']
for metric in metrics:
top3_indices = df[metric].nsmallest(3).index
df.at[top3_indices[0], metric] = str(df[metric].iloc[top3_indices[0]]) + ' 🏆'
df.at[top3_indices[1], metric] = str(df[metric].iloc[top3_indices[1]]) + ' 🥈'
df.at[top3_indices[2], metric] = str(df[metric].iloc[top3_indices[2]]) + ' 🥉'
return df
def mean_absolute_percentage_error(y_true, y_pred):
y_true, y_pred = np.array(y_true), np.array(y_pred)
return np.mean(np.abs((y_true - y_pred) / y_true)) * 100
def smape(y_true, y_pred):
return 100 * np.mean(2 * np.abs(y_pred - y_true) / (np.abs(y_pred) + np.abs(y_true)))
def evaluate_models(models, X, y, k_folds=5):
final_results = []
kf = KFold(n_splits=k_folds, shuffle=True, random_state=42)
for model in models:
results = []
for train_index, test_index in kf.split(X):
X_train, X_test = X.iloc[train_index], X.iloc[test_index]
y_train, y_test = y.iloc[train_index], y.iloc[test_index]
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
r2 = round(r2_score(y_test, y_pred),4)
rmse = round(np.sqrt(mean_squared_error(y_test, y_pred)),4)
mse = round(mean_squared_error(y_test, y_pred),4)
mae = mean_absolute_error(y_test, y_pred)
mape = mean_absolute_percentage_error(y_test, y_pred)
smape_val = smape(y_test, y_pred)
results.append([r2, rmse, mse, mae, mape, smape_val])
# Calculate average scores
avg_scores = [round(np.mean([result[i] for result in results]), 4) for i in range(len(results[0]))]
final_results.append([type(model).__name__] + avg_scores)
print("✔️ {}".format(type(model).__name__))
columns = ["Model", "R2 Score", "RMSE", "MSE", "MAE", "MAPE", "SMAPE"]
results_df = pd.DataFrame(final_results, columns=columns)
# Append ✔️ to the best scores
results_df = append_medals(results_df)
return results_df
models = [
# Ensemble Methods
RandomForestRegressor(n_estimators=30, random_state=0),
GradientBoostingRegressor(n_estimators=30, random_state=0),
AdaBoostRegressor(n_estimators=30, random_state=0),
BaggingRegressor(n_estimators=30, random_state=0),
ExtraTreesRegressor(n_estimators=30, random_state=0),
# Linear Models
LinearRegression(),
Ridge(),
Lasso(),
ElasticNet(),
BayesianRidge(),
ARDRegression(),
PassiveAggressiveRegressor(),
HuberRegressor(),
TheilSenRegressor(),
RANSACRegressor(),
# SVM
SVR(),
LinearSVR(),
# Neighbors
KNeighborsRegressor(),
# Neural Network
MLPRegressor(hidden_layer_sizes=(100,), max_iter=500, random_state=0),
# Tree-based
DecisionTreeRegressor(),
XGBRegressor(n_estimators=30, random_state=0),
LGBMRegressor(n_estimators=30, random_state=0),
# Kernel Ridge Regression
KernelRidge(),
]
Predicting Total
¶y = target_ActualWeightTotal
X_transformed = data_prep_pipeline.fit_transform(X, y)
results_df_total = evaluate_models(models, X_transformed, y)
results_df_total
✔️ RandomForestRegressor ✔️ GradientBoostingRegressor ✔️ AdaBoostRegressor ✔️ BaggingRegressor ✔️ ExtraTreesRegressor ✔️ LinearRegression ✔️ Ridge ✔️ Lasso ✔️ ElasticNet ✔️ BayesianRidge ✔️ ARDRegression ✔️ PassiveAggressiveRegressor ✔️ HuberRegressor ✔️ TheilSenRegressor ✔️ RANSACRegressor ✔️ SVR ✔️ LinearSVR ✔️ KNeighborsRegressor ✔️ MLPRegressor ✔️ DecisionTreeRegressor ✔️ XGBRegressor ✔️ LGBMRegressor ✔️ KernelRidge
| Model | R2 Score | RMSE | MSE | MAE | MAPE | SMAPE | |
|---|---|---|---|---|---|---|---|
| 0 | RandomForestRegressor | 0.8011 🥈 | 478.7839 🥈 | 230298.5194 🥈 | 290.2632 🥈 | 1.6181 🥈 | 1.6119 🥈 |
| 1 | GradientBoostingRegressor | 0.7269 | 563.3094 | 317832.8091 | 404.0855 | 2.2576 | 2.2482 |
| 2 | AdaBoostRegressor | 0.5724 | 705.6288 | 498249.4218 | 546.5403 | 3.0514 | 3.0409 |
| 3 | BaggingRegressor | 0.8011 🥉 | 478.8759 🥉 | 230345.9915 🥉 | 290.4994 🥉 | 1.6193 🥉 | 1.6131 🥉 |
| 4 | ExtraTreesRegressor | 0.7987 | 481.9655 | 233258.4131 | 288.942 🏆 | 1.6095 🏆 | 1.6037 🏆 |
| 5 | LinearRegression | 0.6614 | 627.4376 | 394391.666 | 459.6826 | 2.5602 | 2.5534 |
| 6 | Ridge | 0.6615 | 627.4257 | 394374.8146 | 459.6643 | 2.5601 | 2.5533 |
| 7 | Lasso | 0.6615 | 627.3773 | 394296.7737 | 459.4429 | 2.5588 | 2.5521 |
| 8 | ElasticNet | 0.6397 | 647.966 | 420133.1125 | 483.4911 | 2.6924 | 2.6869 |
| 9 | BayesianRidge | 0.6615 | 627.3957 | 394308.1675 | 459.5972 | 2.5597 | 2.553 |
| 10 | ARDRegression | 0.6616 | 627.3131 | 394215.7576 | 458.5653 | 2.5535 | 2.5469 |
| 11 | PassiveAggressiveRegressor | 0.6513 | 636.9409 | 406414.8129 | 459.6518 | 2.5601 | 2.5517 |
| 12 | HuberRegressor | 0.6577 | 630.8401 | 398798.3915 | 456.886 | 2.5433 | 2.5362 |
| 13 | TheilSenRegressor | -4.0342 | 2334.9025 | 5732687.854 | 927.0029 | 5.1288 | 6.4341 |
| 14 | RANSACRegressor | 0.437 | 797.5331 | 645369.6614 | 567.6713 | 3.1597 | 3.163 |
| 15 | SVR | 0.1293 | 1008.7353 | 1018526.4218 | 806.9198 | 4.4698 | 4.4829 |
| 16 | LinearSVR | -216.1259 | 15903.7851 | 252932788.781 | 15866.4447 | 88.321 | 158.2088 |
| 17 | KNeighborsRegressor | 0.757 | 528.8404 | 281208.5656 | 325.7707 | 1.8184 | 1.8106 |
| 18 | MLPRegressor | -42.7834 | 7141.5165 | 51030007.1392 | 6199.4957 | 34.7312 | 45.495 |
| 19 | DecisionTreeRegressor | 0.7872 | 495.6468 | 246499.6868 | 294.6707 | 1.6412 | 1.6361 |
| 20 | XGBRegressor | 0.8059 🏆 | 473.5516 🏆 | 225130.7845 🏆 | 291.2641 | 1.6239 | 1.6177 |
| 21 | LGBMRegressor | 0.7844 | 499.2884 | 250318.1475 | 321.993 | 1.7957 | 1.7877 |
| 22 | KernelRidge | -283.0564 | 18186.9662 | 330774343.0359 | 18144.7583 | 101.3038 | 191.9926 |
Predicting Front
¶y = target_ActualWeightFront
X_transformed = data_prep_pipeline.fit_transform(X, y)
results_df_total = evaluate_models(models, X_transformed, y)
results_df_total
✔️ RandomForestRegressor ✔️ GradientBoostingRegressor ✔️ AdaBoostRegressor ✔️ BaggingRegressor ✔️ ExtraTreesRegressor ✔️ LinearRegression ✔️ Ridge ✔️ Lasso ✔️ ElasticNet ✔️ BayesianRidge ✔️ ARDRegression ✔️ PassiveAggressiveRegressor ✔️ HuberRegressor ✔️ TheilSenRegressor ✔️ RANSACRegressor ✔️ SVR ✔️ LinearSVR ✔️ KNeighborsRegressor ✔️ MLPRegressor ✔️ DecisionTreeRegressor ✔️ XGBRegressor ✔️ LGBMRegressor ✔️ KernelRidge
| Model | R2 Score | RMSE | MSE | MAE | MAPE | SMAPE | |
|---|---|---|---|---|---|---|---|
| 0 | RandomForestRegressor | 0.8953 🏆 | 229.5118 🏆 | 53085.1121 🏆 | 130.2841 🥉 | 1.2182 🥈 | 1.2143 🥈 |
| 1 | GradientBoostingRegressor | 0.856 | 268.9559 | 73098.1024 | 179.3228 | 1.6784 | 1.6704 |
| 2 | AdaBoostRegressor | 0.7273 | 371.5134 | 138157.1979 | 272.4418 | 2.5413 | 2.5364 |
| 3 | BaggingRegressor | 0.8952 🥈 | 229.5995 🥈 | 53129.3674 🥈 | 130.0752 🏆 | 1.2162 🏆 | 1.2124 🏆 |
| 4 | ExtraTreesRegressor | 0.8903 | 234.6574 | 55478.9068 | 130.2439 🥈 | 1.2183 🥉 | 1.2145 🥉 |
| 5 | LinearRegression | 0.8335 | 289.4355 | 84430.4047 | 199.5315 | 1.8616 | 1.8533 |
| 6 | Ridge | 0.8334 | 289.4616 | 84442.4363 | 199.5924 | 1.8621 | 1.8539 |
| 7 | Lasso | 0.8332 | 289.6671 | 84553.998 | 200.0872 | 1.8667 | 1.8585 |
| 8 | ElasticNet | 0.8065 | 312.5823 | 98204.519 | 226.3124 | 2.1056 | 2.0976 |
| 9 | BayesianRidge | 0.8331 | 289.7846 | 84611.4152 | 199.9467 | 1.8652 | 1.857 |
| 10 | ARDRegression | 0.8332 | 289.6515 | 84555.3324 | 199.9103 | 1.8651 | 1.8569 |
| 11 | PassiveAggressiveRegressor | 0.8235 | 297.9253 | 89457.7342 | 200.3422 | 1.8712 | 1.8609 |
| 12 | HuberRegressor | 0.831 | 291.5693 | 85663.8733 | 196.1196 | 1.8327 | 1.8231 |
| 13 | TheilSenRegressor | -4.1268 | 1576.5451 | 2562624.5787 | 537.2081 | 4.9534 | 6.559 |
| 14 | RANSACRegressor | 0.6866 | 393.6625 | 157507.5776 | 237.2946 | 2.2091 | 2.2059 |
| 15 | SVR | 0.3097 | 591.3392 | 349863.4979 | 441.7173 | 4.0401 | 4.0742 |
| 16 | LinearSVR | -148.773 | 8705.4732 | 75785957.276 | 8674.241 | 80.5311 | 134.9252 |
| 17 | KNeighborsRegressor | 0.8636 | 262.4784 | 69167.6763 | 156.3438 | 1.4572 | 1.452 |
| 18 | MLPRegressor | -20.4212 | 3289.0527 | 10837852.3505 | 2582.4927 | 23.9279 | 28.698 |
| 19 | DecisionTreeRegressor | 0.8919 | 232.9797 | 54802.9402 | 130.9895 | 1.2258 | 1.2213 |
| 20 | XGBRegressor | 0.8932 🥉 | 231.5479 🥉 | 54148.5461 🥉 | 131.659 | 1.2321 | 1.2269 |
| 21 | LGBMRegressor | 0.8814 | 244.2603 | 60211.5009 | 148.7239 | 1.3882 | 1.382 |
| 22 | KernelRidge | -234.6551 | 10917.1252 | 119188290.7061 | 10891.1801 | 101.5747 | 190.5893 |
Predicting Back
¶y = target_ActualWeightBack
X_transformed = data_prep_pipeline.fit_transform(X, y)
results_df_total = evaluate_models(models, X_transformed, y)
results_df_total
✔️ RandomForestRegressor ✔️ GradientBoostingRegressor ✔️ AdaBoostRegressor ✔️ BaggingRegressor ✔️ ExtraTreesRegressor ✔️ LinearRegression ✔️ Ridge ✔️ Lasso ✔️ ElasticNet ✔️ BayesianRidge ✔️ ARDRegression ✔️ PassiveAggressiveRegressor ✔️ HuberRegressor ✔️ TheilSenRegressor ✔️ RANSACRegressor ✔️ SVR ✔️ LinearSVR ✔️ KNeighborsRegressor ✔️ MLPRegressor ✔️ DecisionTreeRegressor ✔️ XGBRegressor ✔️ LGBMRegressor ✔️ KernelRidge
| Model | R2 Score | RMSE | MSE | MAE | MAPE | SMAPE | |
|---|---|---|---|---|---|---|---|
| 0 | RandomForestRegressor | 0.7872 🥉 | 410.7583 🥉 | 169074.8226 🥉 | 244.1078 🥉 | 3.4455 🥉 | 3.4046 🥉 |
| 1 | GradientBoostingRegressor | 0.6762 | 507.7242 | 258141.6621 | 358.9169 | 5.0233 | 4.9609 |
| 2 | AdaBoostRegressor | 0.523 | 615.0411 | 379359.7921 | 462.0963 | 6.5471 | 6.4033 |
| 3 | BaggingRegressor | 0.7879 🥈 | 410.0674 🥈 | 168522.2289 🥈 | 243.7857 🥈 | 3.4402 🥈 | 3.4002 🥈 |
| 4 | ExtraTreesRegressor | 0.7827 | 415.1815 | 172686.9054 | 244.5569 | 3.4479 | 3.4095 |
| 5 | LinearRegression | 0.471 | 648.7058 | 421243.5064 | 505.3203 | 7.0058 | 6.9315 |
| 6 | Ridge | 0.4711 | 648.6939 | 421226.9838 | 505.336 | 7.0059 | 6.9317 |
| 7 | Lasso | 0.4713 | 648.5511 | 421031.8387 | 505.5588 | 7.0075 | 6.9341 |
| 8 | ElasticNet | 0.4419 | 666.7692 | 444789.351 | 523.9646 | 7.2307 | 7.1718 |
| 9 | BayesianRidge | 0.4712 | 648.6101 | 421097.628 | 505.7998 | 7.0098 | 6.9371 |
| 10 | ARDRegression | 0.4717 | 648.4037 | 420813.7174 | 505.6685 | 7.0061 | 6.9347 |
| 11 | PassiveAggressiveRegressor | 0.4531 | 659.9559 | 435922.7762 | 506.6436 | 7.0153 | 6.9496 |
| 12 | HuberRegressor | 0.4649 | 652.5355 | 426176.4484 | 505.1507 | 7.0135 | 6.9393 |
| 13 | TheilSenRegressor | -0.4491 | 1054.0084 | 1143200.2552 | 664.7454 | 9.2305 | 10.3724 |
| 14 | RANSACRegressor | 0.1243 | 832.885 | 695611.9439 | 605.0011 | 8.4006 | 8.3753 |
| 15 | SVR | 0.0616 | 865.4469 | 749460.4848 | 667.5756 | 9.1126 | 9.2104 |
| 16 | LinearSVR | -32.7431 | 5184.405 | 26879375.009 | 5104.7977 | 70.5015 | 109.1993 |
| 17 | KNeighborsRegressor | 0.7301 | 462.1175 | 214393.8754 | 277.6469 | 3.9305 | 3.8735 |
| 18 | MLPRegressor | -1.4216 | 1383.7872 | 1920399.3398 | 1065.5495 | 14.9436 | 16.3003 |
| 19 | DecisionTreeRegressor | 0.7647 | 431.59 | 186756.6538 | 250.1442 | 3.5247 | 3.4848 |
| 20 | XGBRegressor | 0.7918 🏆 | 406.6225 🏆 | 165583.6396 🏆 | 243.4098 🏆 | 3.4342 🏆 | 3.3929 🏆 |
| 21 | LGBMRegressor | 0.7686 | 429.0368 | 184357.0731 | 272.2088 | 3.8217 | 3.7785 |
| 22 | KernelRidge | -66.1968 | 7316.1791 | 53527705.4363 | 7273.5247 | 101.8426 | 188.0728 |
👨🔬 Feature Engineering
¶Feature Engineer Front Model
¶X = training.drop(columns= targetColumns + ['TruckSID'])
y = target_ActualWeightFront
X_transformed = data_prep_pipeline.fit_transform(X, y)
X_train, X_test, y_train, y_test = train_test_split(X_transformed, y, test_size=0.3, random_state=0)
Let's get a baseline for how well our model is performing.
model = XGBRegressor(random_state=42)
scores = -cross_val_score(model, X_train, y_train, cv=5, scoring="neg_mean_absolute_error")
print(scores)
print("MAE score: {}".format(np.mean(scores)))
[109.83327439 140.11404666 123.61776288 129.49583773 140.21197477] MAE score: 128.65457928631758
Let's start by making as many features as we can think of, then then start reducing them until we get the perfect score.
def FE(Training):
'''
# Position Analysis:
Training['sum_wheelbase_per_front_axle_position'] = Training.groupby('FrontAxlePosition')['WheelBase'].transform('sum')
Training['max_wheelbase_per_front_axle_position'] = Training.groupby('FrontAxlePosition')['WheelBase'].transform('max')
Training['min_wheelbase_per_front_axle_position'] = Training.groupby('FrontAxlePosition')['WheelBase'].transform('min')
Training['std_wheelbase_per_front_axle_position'] = Training.groupby('FrontAxlePosition')['WheelBase'].transform('std')
Training['sum_overhang_per_front_axle_position'] = Training.groupby('FrontAxlePosition')['Overhang'].transform('sum')
Training['max_overhang_per_front_axle_position'] = Training.groupby('FrontAxlePosition')['Overhang'].transform('max')
Training['min_overhang_per_front_axle_position'] = Training.groupby('FrontAxlePosition')['Overhang'].transform('min')
Training['std_overhang_per_front_axle_position'] = Training.groupby('FrontAxlePosition')['Overhang'].transform('std')
Training['avg_wheelbase_per_front_axle_position'] = Training.groupby('FrontAxlePosition')['WheelBase'].transform('mean')
Training['avg_overhang_per_front_axle_position'] = Training.groupby('FrontAxlePosition')['Overhang'].transform('mean')
# Position Analysis (Wheelbase and Overhang):
Training['sum_wheelbase_per_front_axle_position'] = Training.groupby('FrontAxlePosition')['WheelBase'].transform('sum')
Training['max_wheelbase_per_front_axle_position'] = Training.groupby('FrontAxlePosition')['WheelBase'].transform('max')
Training['min_wheelbase_per_front_axle_position'] = Training.groupby('FrontAxlePosition')['WheelBase'].transform('min')
Training['std_wheelbase_per_front_axle_position'] = Training.groupby('FrontAxlePosition')['WheelBase'].transform('std')
Training['sum_overhang_per_front_axle_position'] = Training.groupby('FrontAxlePosition')['Overhang'].transform('sum')
Training['max_overhang_per_front_axle_position'] = Training.groupby('FrontAxlePosition')['Overhang'].transform('max')
Training['min_overhang_per_front_axle_position'] = Training.groupby('FrontAxlePosition')['Overhang'].transform('min')
Training['std_overhang_per_front_axle_position'] = Training.groupby('FrontAxlePosition')['Overhang'].transform('std')
'''
# Interaction Features:
Training['Engine_Transmission'] = Training['Engine'] * Training['Transmission']
Training['TransmissionFamily_EngineFamily'] = Training['TransmissionFamily'] * Training['EngineFamily']
# Polynomial Features for numeric variables:
Training['WheelBase_squared'] = Training['WheelBase'] ** 2
Training['Overhang_squared'] = Training['Overhang'] ** 2
# Ratio Features:
Training['Front_to_Rear_Wheels'] = Training['FrontWheels'] / (Training['RearWheels'] + 0.001) # Add a small number to avoid division by zero
Training['WheelBase_to_Overhang'] = Training['WheelBase'] / (Training['Overhang'] + 0.001)
# Aggregated Features for TransmissionFamily and EngineFamily:
Training['avg_WheelBase_per_TransmissionFamily'] = Training.groupby('TransmissionFamily')['WheelBase'].transform('mean')
Training['avg_Overhang_per_EngineFamily'] = Training.groupby('EngineFamily')['Overhang'].transform('mean')
# Features based on other columns:
Training['sum_WheelBase_per_Engine'] = Training.groupby('Engine')['WheelBase'].transform('sum')
Training['max_Overhang_per_Transmission'] = Training.groupby('Transmission')['Overhang'].transform('max')
#Training['Transmission_EngineFamily'] = Training['Transmission'] * Training['EngineFamily']
'''
# Standard deviation relative to the mean (Coefficient of Variation):
Training['cv_WheelBase_per_FrontAxlePosition'] = Training['std_wheelbase_per_front_axle_position'] / (Training['avg_wheelbase_per_front_axle_position'] + 0.001)
Training['cv_Overhang_per_FrontAxlePosition'] = Training['std_overhang_per_front_axle_position'] / (Training['avg_overhang_per_front_axle_position'] + 0.001)
#Interactions with Top Features:
Training['Engine_TransmissionFamily'] = Training['Engine'] * Training['TransmissionFamily']
#Aggregations with Other Features:
Training['mean_WheelBase_per_Transmission'] = Training.groupby('Transmission')['WheelBase'].transform('mean')
Training['mean_Overhang_per_EngineFamily'] = Training.groupby('EngineFamily')['Overhang'].transform('mean')
Training['sum_FrontWheels_per_Transmission'] = Training.groupby('Transmission')['FrontWheels'].transform('sum')
#Cumulative Sum and Diff Features:
# Assuming some sort of order (like time), adjust as necessary
Training['cumsum_WheelBase'] = Training['WheelBase'].cumsum()
Training['cumsum_Overhang'] = Training['Overhang'].cumsum()
Training['diff_WheelBase'] = Training['WheelBase'].diff()
Training['diff_Overhang'] = Training['Overhang'].diff()
#Bin-based Features:
Training['WheelBase_bins'] = pd.cut(Training['WheelBase'], bins=5, labels=False) # 5 bins, can adjust
Training['mean_Overhang_per_WheelBase_bins'] = Training.groupby('WheelBase_bins')['Overhang'].transform('mean')
'''
return Training
X_train_fe = FE(X_train)
X_test_fe = FE(X_test)
All the features where graphed to help find the most useful ones.
model = XGBRegressor(random_state=42)
model.fit(X_train_fe, y_train)
scores = -cross_val_score(model, X_train_fe, y_train, cv=5, scoring="neg_mean_absolute_error")
print(scores)
print("Mean:", scores.mean())
xgb.plot_importance(model)
plt.show()
[109.93013091 138.35848818 126.57804054 129.82614548 140.92304951] Mean: 129.12317092483107
feature_importances = model.feature_importances_
# Create a DataFrame for better manipulation
importance_df = pd.DataFrame({
'Feature': X_train.columns,
'Importance': feature_importances
})
importance_df_sorted = importance_df.sort_values(by='Importance', ascending=False)
importance_df_sorted.to_csv('importance_df_sorted.csv', index=False)
importance_df_sorted
| Feature | Importance | |
|---|---|---|
| 18 | TransmissionFamily | 0.368260 |
| 1 | Transmission | 0.233128 |
| 17 | EngineFamily | 0.129587 |
| 0 | Engine | 0.105301 |
| 16 | TagAxle | 0.042307 |
| 14 | FrontWheels | 0.041407 |
| 6 | Liner | 0.019834 |
| 24 | WheelBase_to_Overhang | 0.007705 |
| 4 | Overhang | 0.006767 |
| 3 | WheelBase | 0.006328 |
| 8 | Cab | 0.004250 |
| 28 | max_Overhang_per_Transmission | 0.004218 |
| 26 | avg_Overhang_per_EngineFamily | 0.003950 |
| 21 | WheelBase_squared | 0.003193 |
| 12 | RearWheels | 0.002884 |
| 5 | FrameRails | 0.002641 |
| 10 | RearSusp | 0.002446 |
| 19 | Engine_Transmission | 0.002134 |
| 23 | Front_to_Rear_Wheels | 0.002083 |
| 22 | Overhang_squared | 0.002062 |
| 13 | RearTires | 0.001861 |
| 20 | TransmissionFamily_EngineFamily | 0.001588 |
| 11 | FrontSusp | 0.001357 |
| 9 | RearAxels | 0.001317 |
| 15 | FrontTires | 0.001300 |
| 27 | sum_WheelBase_per_Engine | 0.000877 |
| 2 | FrontAxlePosition | 0.000817 |
| 7 | FrontEndExt | 0.000398 |
| 25 | avg_WheelBase_per_TransmissionFamily | 0.000000 |
model = xgb.train({"learning_rate": 0.1}, xgb.DMatrix(X_train_fe, label=y_train), 100)
# Specifically for XGBoost, you can use the TreeExplainer which works best for tree-based models
explainer = shap.TreeExplainer(model)
# Compute SHAP values for a sample of the test set
shap_values = explainer.shap_values(X_test_fe)
# Visualize the first prediction's explanation
shap.initjs()
shap.force_plot(explainer.expected_value, shap_values[0], X_test_fe.iloc[0])
shap.summary_plot(shap_values, X_test_fe)
# For a dependence plot for a specific feature
for i in range(5):
feature_name = X_test.columns[i] # get the name of the first feature
shap.dependence_plot(feature_name, shap_values, X_test_fe)
Features where filtered out based on an arbitrary number I fiddled with until I got the best results
# Filter out features with importance less importance
#0.001927
selected_features = importance_df[importance_df['Importance'] >= 0.001927]['Feature'].tolist()
# Subset the dataset
X_train_selected = X_train[selected_features]
X_test_selected = X_test[selected_features]
model = XGBRegressor(random_state=42) # or xgb.XGBClassifier() for classification
scores = -cross_val_score(model, X_train_selected, y_train, cv=5, scoring="neg_mean_absolute_error")
print(scores)
print("Mean:", scores.mean())
[108.81307802 141.39698849 126.0712178 130.55963894 137.75222762] Mean: 128.9186301731419
We see a decent difference with the new and improved features.
X_train_selected.to_csv('X_train_selected.csv', index=False)
front_selected_columns = X_train_selected.columns
front_selected_columns
Index(['Engine', 'Transmission', 'WheelBase', 'Overhang', 'FrameRails',
'Liner', 'Cab', 'RearSusp', 'RearWheels', 'FrontWheels', 'TagAxle',
'EngineFamily', 'TransmissionFamily', 'Engine_Transmission',
'WheelBase_squared', 'Overhang_squared', 'Front_to_Rear_Wheels',
'WheelBase_to_Overhang', 'avg_Overhang_per_EngineFamily',
'max_Overhang_per_Transmission'],
dtype='object')
These were deemed the important features
X_transformed_fe = FE(X_transformed)
X_transformed_fe = X_transformed_fe[front_selected_columns]
X_transformed_fe
| Engine | Transmission | WheelBase | Overhang | FrameRails | Liner | Cab | RearSusp | RearWheels | FrontWheels | TagAxle | EngineFamily | TransmissionFamily | Engine_Transmission | WheelBase_squared | Overhang_squared | Front_to_Rear_Wheels | WheelBase_to_Overhang | avg_Overhang_per_EngineFamily | max_Overhang_per_Transmission | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | -0.775150 | 1.199613 | 2.620206 | 0.867729 | 0.647233 | 1.118745 | -0.946623 | 0.251651 | 1.262227 | 1.224191 | -0.15539 | -0.774318 | 1.059240 | -0.929880 | 6.865481 | 0.752953 | 0.969098 | 3.016139 | -0.089933 | 2.373981 |
| 1 | -0.775150 | -0.917003 | -1.545882 | -1.391651 | 0.647233 | 1.118745 | 1.193472 | 0.251651 | -0.317981 | 0.801290 | -0.15539 | -0.774318 | -0.944073 | 0.710815 | 2.389750 | 1.936692 | -2.527878 | 1.111625 | -0.089933 | 1.244292 |
| 2 | 0.194674 | -0.917003 | 0.537162 | -1.391651 | 0.647233 | 1.118745 | -0.408809 | 0.251651 | 1.262227 | 1.224191 | -0.15539 | 0.283715 | -0.944073 | -0.178516 | 0.288543 | 1.936692 | 0.969098 | -0.386267 | -0.073598 | 1.244292 |
| 3 | 0.270655 | 1.199613 | 0.726530 | 0.867729 | 0.647233 | 1.118745 | -0.946623 | 0.251651 | 1.262227 | 1.224191 | -0.15539 | 0.283715 | 1.059240 | 0.324682 | 0.527846 | 0.752953 | 0.969098 | 0.836314 | -0.073598 | 2.373981 |
| 4 | 1.580971 | 1.199613 | 1.484000 | 0.867729 | 0.647233 | 1.118745 | -0.408809 | 0.251651 | -0.317981 | -0.856580 | -0.15539 | 1.551007 | 1.059240 | 1.896553 | 2.202257 | 0.752953 | 2.702304 | 1.708244 | 0.409488 | 2.373981 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 2639 | -0.175652 | -0.231394 | 0.158427 | 0.867729 | 0.647233 | -0.906376 | -0.946623 | -3.511279 | 1.691276 | 1.224191 | -0.15539 | -0.774318 | -0.944073 | 0.040645 | 0.025099 | 0.752953 | 0.723399 | 0.182367 | -0.089933 | 0.867729 |
| 2640 | 0.270655 | 1.199613 | 0.158427 | -1.015088 | 0.647233 | 1.118745 | 2.089485 | 0.251651 | 1.262227 | 1.224191 | -0.15539 | 0.283715 | 1.059240 | 0.324682 | 0.025099 | 1.030403 | 0.969098 | -0.156226 | -0.073598 | 2.373981 |
| 2641 | 0.270655 | 1.199613 | 0.915898 | -0.638524 | 0.647233 | -0.906376 | -0.946623 | 0.517296 | 1.262227 | 1.224191 | -0.15539 | 0.283715 | 1.059240 | 0.324682 | 0.838868 | 0.407713 | 0.969098 | -1.436647 | -0.073598 | 2.373981 |
| 2642 | -0.775150 | -0.917003 | 0.915898 | -2.144777 | 0.647233 | 1.118745 | -0.408809 | 0.251651 | -0.721544 | -1.037440 | -0.15539 | -0.774318 | -0.944073 | 0.710815 | 0.838868 | 4.600069 | 1.439802 | -0.427235 | -0.089933 | 1.244292 |
| 2643 | -0.775150 | -0.917003 | -0.599043 | 0.867729 | 0.647233 | -0.906376 | -0.946623 | 0.251651 | -0.317981 | -0.856580 | -0.15539 | -0.774318 | -0.944073 | 0.710815 | 0.358853 | 0.752953 | 2.702304 | -0.689563 | -0.089933 | 1.244292 |
2644 rows × 20 columns
Let's being optimizing the hyperparameters.
We will use optuna to make this process really easy and reliable.
import optuna
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error
X = X_transformed_fe
y = target_ActualWeightFront
def objective(trial):
# Define the hyperparameter search space
params = {
"n_estimators": trial.suggest_int("n_estimators", 10, 1000),
"learning_rate": trial.suggest_loguniform("learning_rate", 0.01, 0.3),
"max_depth": trial.suggest_int("max_depth", 1, 10),
"reg_lambda": trial.suggest_loguniform("reg_lambda", 1e-9, 100),
"subsample": trial.suggest_float("subsample", 0.1, 1),
"colsample_bytree": trial.suggest_float("colsample_bytree", 0.1, 1),
"gamma": trial.suggest_float("gamma", 0, 1),
"min_child_weight": trial.suggest_int("min_child_weight", 1, 10),
}
# Split the data into train and validation sets
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.3, random_state=42)
# Create and train the XGBRegressor with the suggested hyperparameters
model = XGBRegressor(**params, eval_metric='mae',random_state=42,n_jobs=1)
model.fit(X_train, y_train, eval_set=[(X_val, y_val)], early_stopping_rounds=50, verbose=False)
# Calculate the MAE on the validation set
y_pred = model.predict(X_val)
mae = mean_absolute_error(y_val, y_pred)
return mae
# Create a study object and specify that the goal is to minimize the objective function
study = optuna.create_study(direction="minimize") # We want to minimize the MAE
study.optimize(objective, n_trials=100)
# Get the best hyperparameters and their corresponding MAE
best_params = study.best_params
best_score = study.best_value
print("Best Hyperparameters:", best_params)
print("Best Score (MAE):", best_score)
[I 2023-08-24 21:37:15,634] A new study created in memory with name: no-name-7e8eefa7-d481-4023-86e8-3118d3736533
[I 2023-08-24 21:37:16,635] Trial 0 finished with value: 132.4071453872796 and parameters: {'n_estimators': 778, 'learning_rate': 0.04566242473875747, 'max_depth': 3, 'reg_lambda': 0.5835957689324223, 'subsample': 0.8579024686544837, 'colsample_bytree': 0.6518396226535519, 'gamma': 0.4926660807691057, 'min_child_weight': 7}. Best is trial 0 with value: 132.4071453872796.
[I 2023-08-24 21:37:17,127] Trial 1 finished with value: 125.43281643576826 and parameters: {'n_estimators': 515, 'learning_rate': 0.23491467286748102, 'max_depth': 4, 'reg_lambda': 0.16328659509960294, 'subsample': 0.8753910226197854, 'colsample_bytree': 0.7038019461183781, 'gamma': 0.4183235191057644, 'min_child_weight': 9}. Best is trial 1 with value: 125.43281643576826.
[I 2023-08-24 21:37:17,558] Trial 2 finished with value: 135.5872732013539 and parameters: {'n_estimators': 178, 'learning_rate': 0.03834613358805908, 'max_depth': 10, 'reg_lambda': 1.4484118111965656, 'subsample': 0.7816268862422202, 'colsample_bytree': 0.40808698044572067, 'gamma': 0.39963155713820864, 'min_child_weight': 7}. Best is trial 1 with value: 125.43281643576826.
[I 2023-08-24 21:37:18,483] Trial 3 finished with value: 132.70857972882555 and parameters: {'n_estimators': 884, 'learning_rate': 0.04835611501837917, 'max_depth': 5, 'reg_lambda': 4.062813961076248, 'subsample': 0.18725584879759472, 'colsample_bytree': 0.6563565441068906, 'gamma': 0.9047692372971008, 'min_child_weight': 2}. Best is trial 1 with value: 125.43281643576826.
[I 2023-08-24 21:37:18,533] Trial 4 finished with value: 2113.9746930100755 and parameters: {'n_estimators': 41, 'learning_rate': 0.039154814051259, 'max_depth': 1, 'reg_lambda': 2.3951979690535525, 'subsample': 0.4495753804042988, 'colsample_bytree': 0.787101520408692, 'gamma': 0.11595299260917791, 'min_child_weight': 3}. Best is trial 1 with value: 125.43281643576826.
[I 2023-08-24 21:37:18,827] Trial 5 finished with value: 142.09308829896096 and parameters: {'n_estimators': 893, 'learning_rate': 0.2214655779293194, 'max_depth': 2, 'reg_lambda': 1.8893190620594973, 'subsample': 0.4157732545791466, 'colsample_bytree': 0.2636768600669744, 'gamma': 0.7603688601700967, 'min_child_weight': 9}. Best is trial 1 with value: 125.43281643576826.
[I 2023-08-24 21:37:19,873] Trial 6 finished with value: 140.8227526763224 and parameters: {'n_estimators': 676, 'learning_rate': 0.021881471821074625, 'max_depth': 10, 'reg_lambda': 2.3422047333344067e-07, 'subsample': 0.2339219371634801, 'colsample_bytree': 0.18800371757749024, 'gamma': 0.21128796141900152, 'min_child_weight': 9}. Best is trial 1 with value: 125.43281643576826.
[I 2023-08-24 21:37:21,125] Trial 7 finished with value: 126.54438316671914 and parameters: {'n_estimators': 879, 'learning_rate': 0.0389904280064983, 'max_depth': 5, 'reg_lambda': 1.4567835664186264e-06, 'subsample': 0.827862799711435, 'colsample_bytree': 0.8298477528098921, 'gamma': 0.5016726585159476, 'min_child_weight': 5}. Best is trial 1 with value: 125.43281643576826.
[I 2023-08-24 21:37:23,140] Trial 8 finished with value: 127.91857633422543 and parameters: {'n_estimators': 723, 'learning_rate': 0.02545933982843321, 'max_depth': 7, 'reg_lambda': 0.01613257226273038, 'subsample': 0.6996558998095475, 'colsample_bytree': 0.8148251539719638, 'gamma': 0.5725206576316784, 'min_child_weight': 9}. Best is trial 1 with value: 125.43281643576826.
[I 2023-08-24 21:37:23,165] Trial 9 finished with value: 8723.411002132694 and parameters: {'n_estimators': 20, 'learning_rate': 0.01060863042706886, 'max_depth': 2, 'reg_lambda': 9.821450325011904, 'subsample': 0.23024556388926223, 'colsample_bytree': 0.49175589952144083, 'gamma': 0.9607273045095055, 'min_child_weight': 6}. Best is trial 1 with value: 125.43281643576826.
[I 2023-08-24 21:37:23,571] Trial 10 finished with value: 127.22424531643577 and parameters: {'n_estimators': 436, 'learning_rate': 0.26027780415517815, 'max_depth': 7, 'reg_lambda': 0.0018701442205215858, 'subsample': 0.9610890609392372, 'colsample_bytree': 0.9542696645437158, 'gamma': 0.019810948147209073, 'min_child_weight': 10}. Best is trial 1 with value: 125.43281643576826.
[I 2023-08-24 21:37:24,055] Trial 11 finished with value: 127.62992585996537 and parameters: {'n_estimators': 443, 'learning_rate': 0.10857823212925481, 'max_depth': 5, 'reg_lambda': 7.660758884408461e-07, 'subsample': 0.9725624124669809, 'colsample_bytree': 0.9903055231531217, 'gamma': 0.31326860236484316, 'min_child_weight': 4}. Best is trial 1 with value: 125.43281643576826.
[I 2023-08-24 21:37:24,913] Trial 12 finished with value: 127.99051233863351 and parameters: {'n_estimators': 541, 'learning_rate': 0.09727830958598613, 'max_depth': 4, 'reg_lambda': 3.5129132131749995e-05, 'subsample': 0.6665608044558613, 'colsample_bytree': 0.7731189013828259, 'gamma': 0.599888577286736, 'min_child_weight': 5}. Best is trial 1 with value: 125.43281643576826.
[I 2023-08-24 21:37:25,405] Trial 13 finished with value: 126.03880052542506 and parameters: {'n_estimators': 986, 'learning_rate': 0.09182526748345886, 'max_depth': 7, 'reg_lambda': 3.2495765048162434e-09, 'subsample': 0.8326357233970464, 'colsample_bytree': 0.6189429741728897, 'gamma': 0.357170790031503, 'min_child_weight': 1}. Best is trial 1 with value: 125.43281643576826.
[I 2023-08-24 21:37:25,901] Trial 14 finished with value: 124.94192773929471 and parameters: {'n_estimators': 997, 'learning_rate': 0.15130810686401724, 'max_depth': 8, 'reg_lambda': 1.228545322169782e-09, 'subsample': 0.6468054785776172, 'colsample_bytree': 0.5768366345064918, 'gamma': 0.31555877415318534, 'min_child_weight': 1}. Best is trial 14 with value: 124.94192773929471.
[I 2023-08-24 21:37:26,548] Trial 15 finished with value: 127.12612169395466 and parameters: {'n_estimators': 335, 'learning_rate': 0.17342552513692705, 'max_depth': 8, 'reg_lambda': 0.03275982325966444, 'subsample': 0.6021999568376306, 'colsample_bytree': 0.49886572533716333, 'gamma': 0.21491614902941425, 'min_child_weight': 1}. Best is trial 14 with value: 124.94192773929471.
[I 2023-08-24 21:37:27,040] Trial 16 finished with value: 129.34766854927582 and parameters: {'n_estimators': 551, 'learning_rate': 0.17234331057086902, 'max_depth': 9, 'reg_lambda': 3.041746635728718e-09, 'subsample': 0.559509304164145, 'colsample_bytree': 0.386863594629731, 'gamma': 0.27593218996539043, 'min_child_weight': 7}. Best is trial 14 with value: 124.94192773929471.
[I 2023-08-24 21:37:27,661] Trial 17 finished with value: 129.30547219379724 and parameters: {'n_estimators': 291, 'learning_rate': 0.2689946404503632, 'max_depth': 6, 'reg_lambda': 54.382798309821496, 'subsample': 0.7310685750898838, 'colsample_bytree': 0.5810311633446974, 'gamma': 0.4065854106345552, 'min_child_weight': 3}. Best is trial 14 with value: 124.94192773929471.
[I 2023-08-24 21:37:28,108] Trial 18 finished with value: 131.40859301204344 and parameters: {'n_estimators': 626, 'learning_rate': 0.2963397681364949, 'max_depth': 8, 'reg_lambda': 0.00033949064294032166, 'subsample': 0.6473234385427341, 'colsample_bytree': 0.6981658806409754, 'gamma': 0.1482523193285713, 'min_child_weight': 8}. Best is trial 14 with value: 124.94192773929471.
[I 2023-08-24 21:37:28,471] Trial 19 finished with value: 127.29369686712846 and parameters: {'n_estimators': 227, 'learning_rate': 0.14522763766714922, 'max_depth': 4, 'reg_lambda': 1.3505736702640705e-05, 'subsample': 0.9098958042345393, 'colsample_bytree': 0.7055939076358009, 'gamma': 0.6894163570889831, 'min_child_weight': 10}. Best is trial 14 with value: 124.94192773929471.
[I 2023-08-24 21:37:29,223] Trial 20 finished with value: 126.28532352015114 and parameters: {'n_estimators': 984, 'learning_rate': 0.07413785519030056, 'max_depth': 6, 'reg_lambda': 7.193334213603787e-08, 'subsample': 0.7463209822933622, 'colsample_bytree': 0.5611324505402174, 'gamma': 0.41173879404823976, 'min_child_weight': 4}. Best is trial 14 with value: 124.94192773929471.
[I 2023-08-24 21:37:29,646] Trial 21 finished with value: 127.75240942813288 and parameters: {'n_estimators': 952, 'learning_rate': 0.11994324642256102, 'max_depth': 7, 'reg_lambda': 1.1767466018724594e-09, 'subsample': 0.861606838355318, 'colsample_bytree': 0.6001269936570764, 'gamma': 0.3362459109779751, 'min_child_weight': 1}. Best is trial 14 with value: 124.94192773929471.
[I 2023-08-24 21:37:29,952] Trial 22 finished with value: 127.93821704778023 and parameters: {'n_estimators': 789, 'learning_rate': 0.19899958278162685, 'max_depth': 8, 'reg_lambda': 8.078922061229748e-09, 'subsample': 0.793561168667525, 'colsample_bytree': 0.499708058772754, 'gamma': 0.3408668097590281, 'min_child_weight': 2}. Best is trial 14 with value: 124.94192773929471.
[I 2023-08-24 21:37:30,696] Trial 23 finished with value: 127.8188341270466 and parameters: {'n_estimators': 996, 'learning_rate': 0.13779323967681825, 'max_depth': 9, 'reg_lambda': 1.644571347336594e-08, 'subsample': 0.9113612925586896, 'colsample_bytree': 0.6031829309851063, 'gamma': 0.2644199170082564, 'min_child_weight': 2}. Best is trial 14 with value: 124.94192773929471.
[I 2023-08-24 21:37:31,399] Trial 24 finished with value: 127.17609857131612 and parameters: {'n_estimators': 850, 'learning_rate': 0.07905879059679004, 'max_depth': 4, 'reg_lambda': 3.279426984842617e-08, 'subsample': 0.7856913006335435, 'colsample_bytree': 0.7281678496908335, 'gamma': 0.45504826807035675, 'min_child_weight': 1}. Best is trial 14 with value: 124.94192773929471.
[I 2023-08-24 21:37:31,775] Trial 25 finished with value: 127.74823013420969 and parameters: {'n_estimators': 639, 'learning_rate': 0.15584053836129788, 'max_depth': 6, 'reg_lambda': 1.0324808412445763e-09, 'subsample': 0.9802292936363333, 'colsample_bytree': 0.8828570295216268, 'gamma': 0.34596658642839867, 'min_child_weight': 3}. Best is trial 14 with value: 124.94192773929471.
[I 2023-08-24 21:37:32,408] Trial 26 finished with value: 129.31180878069898 and parameters: {'n_estimators': 786, 'learning_rate': 0.20035718587280305, 'max_depth': 9, 'reg_lambda': 8.977490603162365e-09, 'subsample': 0.6938292297428492, 'colsample_bytree': 0.7409471402421565, 'gamma': 0.4764061710433398, 'min_child_weight': 4}. Best is trial 14 with value: 124.94192773929471.
[I 2023-08-24 21:37:33,153] Trial 27 finished with value: 126.67401015428212 and parameters: {'n_estimators': 936, 'learning_rate': 0.12282767501032768, 'max_depth': 7, 'reg_lambda': 1.3091710090271623e-07, 'subsample': 0.8583142139258471, 'colsample_bytree': 0.6587728722515241, 'gamma': 0.5541906188173018, 'min_child_weight': 6}. Best is trial 14 with value: 124.94192773929471.
[I 2023-08-24 21:37:33,805] Trial 28 finished with value: 132.06898048842885 and parameters: {'n_estimators': 471, 'learning_rate': 0.0915818160375384, 'max_depth': 3, 'reg_lambda': 3.0240136734926108e-06, 'subsample': 0.7615495721964808, 'colsample_bytree': 0.8759651213263617, 'gamma': 0.41750021194833165, 'min_child_weight': 2}. Best is trial 14 with value: 124.94192773929471.
[I 2023-08-24 21:37:34,551] Trial 29 finished with value: 132.10209505864296 and parameters: {'n_estimators': 810, 'learning_rate': 0.06365378018955183, 'max_depth': 3, 'reg_lambda': 3.075498707965548e-08, 'subsample': 0.8614788132675466, 'colsample_bytree': 0.6550870329399731, 'gamma': 0.4879184030426139, 'min_child_weight': 8}. Best is trial 14 with value: 124.94192773929471.
[I 2023-08-24 21:37:35,079] Trial 30 finished with value: 125.50037512791246 and parameters: {'n_estimators': 726, 'learning_rate': 0.2342678663443616, 'max_depth': 8, 'reg_lambda': 0.11069152142306457, 'subsample': 0.6364723456460079, 'colsample_bytree': 0.6202235579891875, 'gamma': 0.3648502758139992, 'min_child_weight': 1}. Best is trial 14 with value: 124.94192773929471.
[I 2023-08-24 21:37:35,442] Trial 31 finished with value: 125.97189123504408 and parameters: {'n_estimators': 680, 'learning_rate': 0.21092503617278432, 'max_depth': 8, 'reg_lambda': 0.054406498286065044, 'subsample': 0.6339828884054357, 'colsample_bytree': 0.6211182926778845, 'gamma': 0.3699007721085756, 'min_child_weight': 1}. Best is trial 14 with value: 124.94192773929471.
[I 2023-08-24 21:37:35,988] Trial 32 finished with value: 126.10003984965365 and parameters: {'n_estimators': 731, 'learning_rate': 0.22556085420214375, 'max_depth': 10, 'reg_lambda': 0.22075025181980001, 'subsample': 0.6422659200143813, 'colsample_bytree': 0.5473009901649993, 'gamma': 0.3801001597057148, 'min_child_weight': 1}. Best is trial 14 with value: 124.94192773929471.
[I 2023-08-24 21:37:36,451] Trial 33 finished with value: 131.49591787035578 and parameters: {'n_estimators': 562, 'learning_rate': 0.29616535915168607, 'max_depth': 8, 'reg_lambda': 0.16547199912174967, 'subsample': 0.5904974425459516, 'colsample_bytree': 0.6755489682556939, 'gamma': 0.27713272176225157, 'min_child_weight': 2}. Best is trial 14 with value: 124.94192773929471.
[I 2023-08-24 21:37:36,840] Trial 34 finished with value: 126.23389286838791 and parameters: {'n_estimators': 707, 'learning_rate': 0.2302517191076446, 'max_depth': 9, 'reg_lambda': 0.0070570853182569835, 'subsample': 0.5169166182926936, 'colsample_bytree': 0.7369361951376336, 'gamma': 0.41569010064862805, 'min_child_weight': 3}. Best is trial 14 with value: 124.94192773929471.
[I 2023-08-24 21:37:37,338] Trial 35 finished with value: 126.67065614176637 and parameters: {'n_estimators': 616, 'learning_rate': 0.1700895480673002, 'max_depth': 8, 'reg_lambda': 0.5184641086504166, 'subsample': 0.7145590172052676, 'colsample_bytree': 0.6324981108174565, 'gamma': 0.5184779507402361, 'min_child_weight': 2}. Best is trial 14 with value: 124.94192773929471.
[I 2023-08-24 21:37:37,864] Trial 36 finished with value: 126.63194048134446 and parameters: {'n_estimators': 360, 'learning_rate': 0.19865963591983107, 'max_depth': 10, 'reg_lambda': 0.08694538127564799, 'subsample': 0.5111828015207219, 'colsample_bytree': 0.6852301917549287, 'gamma': 0.45259755387297074, 'min_child_weight': 8}. Best is trial 14 with value: 124.94192773929471.
[I 2023-08-24 21:37:38,226] Trial 37 finished with value: 127.72103520544711 and parameters: {'n_estimators': 497, 'learning_rate': 0.24869046937369327, 'max_depth': 6, 'reg_lambda': 0.8154092149790162, 'subsample': 0.6396487676063962, 'colsample_bytree': 0.6366047365057361, 'gamma': 0.3037704019047905, 'min_child_weight': 1}. Best is trial 14 with value: 124.94192773929471.
[I 2023-08-24 21:37:38,520] Trial 38 finished with value: 133.78171368269835 and parameters: {'n_estimators': 665, 'learning_rate': 0.139585866366377, 'max_depth': 5, 'reg_lambda': 0.055487274882132925, 'subsample': 0.41848757937267045, 'colsample_bytree': 0.5570471108610044, 'gamma': 0.22306362291912624, 'min_child_weight': 2}. Best is trial 14 with value: 124.94192773929471.
[I 2023-08-24 21:37:38,643] Trial 39 finished with value: 171.98024367325252 and parameters: {'n_estimators': 119, 'learning_rate': 0.20142547618458867, 'max_depth': 1, 'reg_lambda': 0.003686998021917772, 'subsample': 0.68993710660369, 'colsample_bytree': 0.436511353459194, 'gamma': 0.37579339831295067, 'min_child_weight': 3}. Best is trial 14 with value: 124.94192773929471.
[I 2023-08-24 21:37:39,162] Trial 40 finished with value: 126.42065737169395 and parameters: {'n_estimators': 383, 'learning_rate': 0.25047125589255326, 'max_depth': 9, 'reg_lambda': 0.0011585704890211242, 'subsample': 0.4728127379765559, 'colsample_bytree': 0.7482256313362939, 'gamma': 0.15458495453162724, 'min_child_weight': 7}. Best is trial 14 with value: 124.94192773929471.
[I 2023-08-24 21:37:39,611] Trial 41 finished with value: 126.68262702692066 and parameters: {'n_estimators': 903, 'learning_rate': 0.1691805003250567, 'max_depth': 7, 'reg_lambda': 0.026394462451432413, 'subsample': 0.7684966825687314, 'colsample_bytree': 0.6127117427065627, 'gamma': 0.36438609047202186, 'min_child_weight': 1}. Best is trial 14 with value: 124.94192773929471.
[I 2023-08-24 21:37:40,133] Trial 42 finished with value: 125.60597548016372 and parameters: {'n_estimators': 857, 'learning_rate': 0.12546314852907015, 'max_depth': 8, 'reg_lambda': 0.007183940573932222, 'subsample': 0.6065792848746795, 'colsample_bytree': 0.6391509764193508, 'gamma': 0.3080311787931148, 'min_child_weight': 1}. Best is trial 14 with value: 124.94192773929471.
[I 2023-08-24 21:37:40,681] Trial 43 finished with value: 124.78965655502203 and parameters: {'n_estimators': 858, 'learning_rate': 0.1301914704241828, 'max_depth': 8, 'reg_lambda': 0.010074944155120275, 'subsample': 0.599725649219366, 'colsample_bytree': 0.6956601219005432, 'gamma': 0.30697906029172484, 'min_child_weight': 1}. Best is trial 43 with value: 124.78965655502203.
[I 2023-08-24 21:37:41,185] Trial 44 finished with value: 138.90886236618388 and parameters: {'n_estimators': 840, 'learning_rate': 0.1194690794989917, 'max_depth': 2, 'reg_lambda': 0.010796484962367237, 'subsample': 0.5793816890340173, 'colsample_bytree': 0.7992601916648998, 'gamma': 0.3099112793794747, 'min_child_weight': 2}. Best is trial 43 with value: 124.78965655502203.
[I 2023-08-24 21:37:41,888] Trial 45 finished with value: 126.189182540932 and parameters: {'n_estimators': 909, 'learning_rate': 0.14092622528316182, 'max_depth': 10, 'reg_lambda': 0.0007643143265683949, 'subsample': 0.5360575920729431, 'colsample_bytree': 0.6860112521164702, 'gamma': 0.2467370808611326, 'min_child_weight': 1}. Best is trial 43 with value: 124.78965655502203.
[I 2023-08-24 21:37:42,461] Trial 46 finished with value: 125.23514616459383 and parameters: {'n_estimators': 748, 'learning_rate': 0.11108075777632465, 'max_depth': 8, 'reg_lambda': 0.004364429749935437, 'subsample': 0.6164735505650625, 'colsample_bytree': 0.7741222303427062, 'gamma': 0.18586918457369517, 'min_child_weight': 3}. Best is trial 43 with value: 124.78965655502203.
[I 2023-08-24 21:37:43,039] Trial 47 finished with value: 126.91608696079975 and parameters: {'n_estimators': 743, 'learning_rate': 0.10800585707732638, 'max_depth': 9, 'reg_lambda': 0.0001303685356377775, 'subsample': 0.6727644026139147, 'colsample_bytree': 0.784000545401681, 'gamma': 0.1904953489357061, 'min_child_weight': 3}. Best is trial 43 with value: 124.78965655502203.
[I 2023-08-24 21:37:43,730] Trial 48 finished with value: 129.25931424157744 and parameters: {'n_estimators': 819, 'learning_rate': 0.15967334567727418, 'max_depth': 4, 'reg_lambda': 0.0022539182331560996, 'subsample': 0.5569223305818481, 'colsample_bytree': 0.8404318302431052, 'gamma': 0.11026095283087317, 'min_child_weight': 5}. Best is trial 43 with value: 124.78965655502203.
[I 2023-08-24 21:37:44,170] Trial 49 finished with value: 124.47597090483312 and parameters: {'n_estimators': 754, 'learning_rate': 0.19021426230169275, 'max_depth': 7, 'reg_lambda': 3.306727865458625, 'subsample': 0.7209932624237528, 'colsample_bytree': 0.7638264064206766, 'gamma': 0.23532758520441605, 'min_child_weight': 4}. Best is trial 49 with value: 124.47597090483312.
[I 2023-08-24 21:37:44,671] Trial 50 finished with value: 124.37741557777078 and parameters: {'n_estimators': 753, 'learning_rate': 0.18730972755533032, 'max_depth': 7, 'reg_lambda': 4.150263201191068, 'subsample': 0.7214442686673199, 'colsample_bytree': 0.7713579237492315, 'gamma': 0.24558311289405715, 'min_child_weight': 4}. Best is trial 50 with value: 124.37741557777078.
[I 2023-08-24 21:37:45,138] Trial 51 finished with value: 125.7098121162626 and parameters: {'n_estimators': 581, 'learning_rate': 0.18059223601671676, 'max_depth': 7, 'reg_lambda': 4.355119520810934, 'subsample': 0.7231707713762874, 'colsample_bytree': 0.7652425178862647, 'gamma': 0.23725647139373546, 'min_child_weight': 4}. Best is trial 50 with value: 124.37741557777078.
[I 2023-08-24 21:37:45,615] Trial 52 finished with value: 125.2506518616184 and parameters: {'n_estimators': 749, 'learning_rate': 0.1516014963700012, 'max_depth': 7, 'reg_lambda': 1.7466315341996252, 'subsample': 0.6941528453805853, 'colsample_bytree': 0.7044794420002095, 'gamma': 0.18897408089386372, 'min_child_weight': 5}. Best is trial 50 with value: 124.37741557777078.
[I 2023-08-24 21:37:46,527] Trial 53 finished with value: 128.27068861185452 and parameters: {'n_estimators': 762, 'learning_rate': 0.14343091511975034, 'max_depth': 7, 'reg_lambda': 20.38590846452544, 'subsample': 0.6760926482881967, 'colsample_bytree': 0.8067087470731451, 'gamma': 0.18661634250468895, 'min_child_weight': 5}. Best is trial 50 with value: 124.37741557777078.
[I 2023-08-24 21:37:47,592] Trial 54 finished with value: 125.74003758658691 and parameters: {'n_estimators': 939, 'learning_rate': 0.10562626179784838, 'max_depth': 6, 'reg_lambda': 1.8440324083592354, 'subsample': 0.7200035934992797, 'colsample_bytree': 0.7147128741111464, 'gamma': 0.08953477362904216, 'min_child_weight': 6}. Best is trial 50 with value: 124.37741557777078.
[I 2023-08-24 21:37:48,429] Trial 55 finished with value: 127.71839578085643 and parameters: {'n_estimators': 881, 'learning_rate': 0.1581042037181086, 'max_depth': 7, 'reg_lambda': 9.333457239154471, 'subsample': 0.6110040931006008, 'colsample_bytree': 0.7770699220228973, 'gamma': 0.17839775215303752, 'min_child_weight': 4}. Best is trial 50 with value: 124.37741557777078.
[I 2023-08-24 21:37:48,914] Trial 56 finished with value: 127.16540066120906 and parameters: {'n_estimators': 771, 'learning_rate': 0.18388296619301872, 'max_depth': 8, 'reg_lambda': 0.3457323276553971, 'subsample': 0.7441842535381664, 'colsample_bytree': 0.7191697888375954, 'gamma': 0.2580112959152969, 'min_child_weight': 5}. Best is trial 50 with value: 124.37741557777078.
[I 2023-08-24 21:37:49,321] Trial 57 finished with value: 126.8254523673646 and parameters: {'n_estimators': 681, 'learning_rate': 0.1303480518644968, 'max_depth': 6, 'reg_lambda': 0.9472957449457589, 'subsample': 0.7983635594384347, 'colsample_bytree': 0.7505270547229714, 'gamma': 0.2143205517612441, 'min_child_weight': 4}. Best is trial 50 with value: 124.37741557777078.
[I 2023-08-24 21:37:49,810] Trial 58 finished with value: 126.38611362562972 and parameters: {'n_estimators': 808, 'learning_rate': 0.15257711217951989, 'max_depth': 7, 'reg_lambda': 2.4979307112482645, 'subsample': 0.695924168719525, 'colsample_bytree': 0.8476518743291745, 'gamma': 0.08044309520787929, 'min_child_weight': 4}. Best is trial 50 with value: 124.37741557777078.
[I 2023-08-24 21:37:51,287] Trial 59 finished with value: 127.36896228550063 and parameters: {'n_estimators': 852, 'learning_rate': 0.09836111373583013, 'max_depth': 8, 'reg_lambda': 31.801614749584456, 'subsample': 0.6609952539696674, 'colsample_bytree': 0.7055730369250286, 'gamma': 0.15175481937518404, 'min_child_weight': 5}. Best is trial 50 with value: 124.37741557777078.
[I 2023-08-24 21:37:51,882] Trial 60 finished with value: 125.48887284516688 and parameters: {'n_estimators': 908, 'learning_rate': 0.11730244814466667, 'max_depth': 9, 'reg_lambda': 0.2757846883608126, 'subsample': 0.8186247628843653, 'colsample_bytree': 0.8110855190652146, 'gamma': 0.27273079370021114, 'min_child_weight': 3}. Best is trial 50 with value: 124.37741557777078.
[I 2023-08-24 21:37:52,457] Trial 61 finished with value: 127.01004850834383 and parameters: {'n_estimators': 700, 'learning_rate': 0.18380038653838035, 'max_depth': 5, 'reg_lambda': 5.17592419859341, 'subsample': 0.764385571073037, 'colsample_bytree': 0.6809648116989024, 'gamma': 0.2171710509595398, 'min_child_weight': 6}. Best is trial 50 with value: 124.37741557777078.
[I 2023-08-24 21:37:52,949] Trial 62 finished with value: 125.37953351306675 and parameters: {'n_estimators': 638, 'learning_rate': 0.22067644788605653, 'max_depth': 7, 'reg_lambda': 1.3011722141814206, 'subsample': 0.7291339124454443, 'colsample_bytree': 0.7721187572483466, 'gamma': 0.2909904015976836, 'min_child_weight': 10}. Best is trial 50 with value: 124.37741557777078.
[I 2023-08-24 21:37:53,531] Trial 63 finished with value: 125.56025292230794 and parameters: {'n_estimators': 640, 'learning_rate': 0.21185808436499187, 'max_depth': 7, 'reg_lambda': 0.9897291127546658, 'subsample': 0.738579757973617, 'colsample_bytree': 0.7699563022916769, 'gamma': 0.2812343348320005, 'min_child_weight': 4}. Best is trial 50 with value: 124.37741557777078.
[I 2023-08-24 21:37:54,118] Trial 64 finished with value: 126.72391323598866 and parameters: {'n_estimators': 602, 'learning_rate': 0.15411394152311392, 'max_depth': 7, 'reg_lambda': 0.01964942567411227, 'subsample': 0.6939386996960284, 'colsample_bytree': 0.7320781399898583, 'gamma': 0.24525916698437117, 'min_child_weight': 5}. Best is trial 50 with value: 124.37741557777078.
[I 2023-08-24 21:37:55,744] Trial 65 finished with value: 128.96415991026447 and parameters: {'n_estimators': 757, 'learning_rate': 0.1292091522125983, 'max_depth': 6, 'reg_lambda': 85.98770870798518, 'subsample': 0.6161501958383468, 'colsample_bytree': 0.8207925484476177, 'gamma': 0.3294319485352685, 'min_child_weight': 10}. Best is trial 50 with value: 124.37741557777078.
[I 2023-08-24 21:37:56,301] Trial 66 finished with value: 127.09116100244017 and parameters: {'n_estimators': 971, 'learning_rate': 0.268791896141803, 'max_depth': 8, 'reg_lambda': 10.233536155265499, 'subsample': 0.5855221807340311, 'colsample_bytree': 0.6659513832845251, 'gamma': 0.1877866488353676, 'min_child_weight': 6}. Best is trial 50 with value: 124.37741557777078.
[I 2023-08-24 21:37:56,860] Trial 67 finished with value: 126.8051622520466 and parameters: {'n_estimators': 792, 'learning_rate': 0.18962802936767245, 'max_depth': 7, 'reg_lambda': 2.136956352148615, 'subsample': 0.650547553623773, 'colsample_bytree': 0.8691035766968243, 'gamma': 0.12570509539226, 'min_child_weight': 9}. Best is trial 50 with value: 124.37741557777078.
[I 2023-08-24 21:37:57,453] Trial 68 finished with value: 125.3771425338476 and parameters: {'n_estimators': 829, 'learning_rate': 0.21625555645142713, 'max_depth': 8, 'reg_lambda': 0.31641146123709396, 'subsample': 0.6758175051437016, 'colsample_bytree': 0.7861367236113043, 'gamma': 0.314508456530008, 'min_child_weight': 3}. Best is trial 50 with value: 124.37741557777078.
[I 2023-08-24 21:37:58,103] Trial 69 finished with value: 126.76625103313917 and parameters: {'n_estimators': 827, 'learning_rate': 0.17260877195104368, 'max_depth': 9, 'reg_lambda': 0.1436988064416842, 'subsample': 0.6717444643552086, 'colsample_bytree': 0.9190047351584717, 'gamma': 0.31724172134638773, 'min_child_weight': 3}. Best is trial 50 with value: 124.37741557777078.
[I 2023-08-24 21:37:58,613] Trial 70 finished with value: 126.43052631061083 and parameters: {'n_estimators': 878, 'learning_rate': 0.13534441383641313, 'max_depth': 8, 'reg_lambda': 0.020229149831253634, 'subsample': 0.6182999944288874, 'colsample_bytree': 0.7129336029923017, 'gamma': 0.24703668993557445, 'min_child_weight': 4}. Best is trial 50 with value: 124.37741557777078.
[I 2023-08-24 21:37:59,274] Trial 71 finished with value: 125.99202760941436 and parameters: {'n_estimators': 725, 'learning_rate': 0.22221988649994678, 'max_depth': 8, 'reg_lambda': 0.5543939062984486, 'subsample': 0.7155663207614813, 'colsample_bytree': 0.7879948765518512, 'gamma': 0.28693516165868166, 'min_child_weight': 3}. Best is trial 50 with value: 124.37741557777078.
[I 2023-08-24 21:37:59,702] Trial 72 finished with value: 125.15466708320214 and parameters: {'n_estimators': 658, 'learning_rate': 0.20078630473606993, 'max_depth': 7, 'reg_lambda': 0.32851245891714786, 'subsample': 0.7454767835897738, 'colsample_bytree': 0.7718127016296609, 'gamma': 0.3320238057130793, 'min_child_weight': 4}. Best is trial 50 with value: 124.37741557777078.
[I 2023-08-24 21:38:00,251] Trial 73 finished with value: 127.05605148968829 and parameters: {'n_estimators': 774, 'learning_rate': 0.19418204085947202, 'max_depth': 8, 'reg_lambda': 0.3996205352218638, 'subsample': 0.6624957021108089, 'colsample_bytree': 0.7415488926723343, 'gamma': 0.38956918003500157, 'min_child_weight': 4}. Best is trial 50 with value: 124.37741557777078.
[I 2023-08-24 21:38:00,845] Trial 74 finished with value: 127.00477580880037 and parameters: {'n_estimators': 698, 'learning_rate': 0.1568657594683309, 'max_depth': 8, 'reg_lambda': 0.060441513660355585, 'subsample': 0.7749597340694521, 'colsample_bytree': 0.8241096457583382, 'gamma': 0.3274662702144984, 'min_child_weight': 3}. Best is trial 50 with value: 124.37741557777078.
[I 2023-08-24 21:38:01,242] Trial 75 finished with value: 129.55909433052582 and parameters: {'n_estimators': 789, 'learning_rate': 0.2414438576927788, 'max_depth': 7, 'reg_lambda': 0.1953128707873193, 'subsample': 0.7526249917836912, 'colsample_bytree': 0.6977447054009538, 'gamma': 0.2205831466453665, 'min_child_weight': 5}. Best is trial 50 with value: 124.37741557777078.
[I 2023-08-24 21:38:01,925] Trial 76 finished with value: 125.8181429077456 and parameters: {'n_estimators': 661, 'learning_rate': 0.168233031570811, 'max_depth': 6, 'reg_lambda': 3.0516499799315895, 'subsample': 0.6895505116263969, 'colsample_bytree': 0.7553510761563896, 'gamma': 0.33647993674148724, 'min_child_weight': 2}. Best is trial 50 with value: 124.37741557777078.
[I 2023-08-24 21:38:02,537] Trial 77 finished with value: 126.47672362051323 and parameters: {'n_estimators': 745, 'learning_rate': 0.1463984477488365, 'max_depth': 9, 'reg_lambda': 0.03247273528508594, 'subsample': 0.6281318762788439, 'colsample_bytree': 0.6562427699236442, 'gamma': 0.26687154949887565, 'min_child_weight': 4}. Best is trial 50 with value: 124.37741557777078.
[I 2023-08-24 21:38:02,982] Trial 78 finished with value: 127.42978343435138 and parameters: {'n_estimators': 957, 'learning_rate': 0.27764832732633016, 'max_depth': 7, 'reg_lambda': 0.09349851795601326, 'subsample': 0.7982234742320704, 'colsample_bytree': 0.79842660876484, 'gamma': 0.3474105977813781, 'min_child_weight': 3}. Best is trial 50 with value: 124.37741557777078.
[I 2023-08-24 21:38:03,407] Trial 79 finished with value: 126.75795148181675 and parameters: {'n_estimators': 867, 'learning_rate': 0.19991996322767266, 'max_depth': 6, 'reg_lambda': 0.7856054483056987, 'subsample': 0.7115101567074824, 'colsample_bytree': 0.8392228276813398, 'gamma': 0.29680967590278323, 'min_child_weight': 2}. Best is trial 50 with value: 124.37741557777078.
[I 2023-08-24 21:38:03,738] Trial 80 finished with value: 127.3768448913728 and parameters: {'n_estimators': 523, 'learning_rate': 0.24538116887519784, 'max_depth': 8, 'reg_lambda': 0.011360681555715329, 'subsample': 0.603146724150795, 'colsample_bytree': 0.7282256062059301, 'gamma': 0.39194523797990655, 'min_child_weight': 4}. Best is trial 50 with value: 124.37741557777078.
[I 2023-08-24 21:38:04,196] Trial 81 finished with value: 125.74013106108312 and parameters: {'n_estimators': 700, 'learning_rate': 0.22622359532609446, 'max_depth': 7, 'reg_lambda': 1.6583776833047204, 'subsample': 0.7408298303973654, 'colsample_bytree': 0.7699584268356066, 'gamma': 0.293069437280325, 'min_child_weight': 5}. Best is trial 50 with value: 124.37741557777078.
[I 2023-08-24 21:38:04,734] Trial 82 finished with value: 126.3370628837374 and parameters: {'n_estimators': 598, 'learning_rate': 0.1815708451834071, 'max_depth': 8, 'reg_lambda': 1.0442256083104078, 'subsample': 0.7246567811435929, 'colsample_bytree': 0.7917220874690766, 'gamma': 0.23972452960367874, 'min_child_weight': 7}. Best is trial 50 with value: 124.37741557777078.
[I 2023-08-24 21:38:05,488] Trial 83 finished with value: 124.51314669592254 and parameters: {'n_estimators': 647, 'learning_rate': 0.20848651602110835, 'max_depth': 7, 'reg_lambda': 5.864365152232079, 'subsample': 0.6364316782214184, 'colsample_bytree': 0.7655542658021014, 'gamma': 0.2711321099798909, 'min_child_weight': 2}. Best is trial 50 with value: 124.37741557777078.
[I 2023-08-24 21:38:06,052] Trial 84 finished with value: 126.48625309941751 and parameters: {'n_estimators': 832, 'learning_rate': 0.1690307086874509, 'max_depth': 8, 'reg_lambda': 5.93475918201282, 'subsample': 0.573688043254779, 'colsample_bytree': 0.7478568850858001, 'gamma': 0.20126602669309385, 'min_child_weight': 2}. Best is trial 50 with value: 124.37741557777078.
[I 2023-08-24 21:38:06,529] Trial 85 finished with value: 127.28465074976386 and parameters: {'n_estimators': 721, 'learning_rate': 0.20888459703364087, 'max_depth': 7, 'reg_lambda': 14.014744516398116, 'subsample': 0.6393126693913412, 'colsample_bytree': 0.6944805604865388, 'gamma': 0.1703468601681579, 'min_child_weight': 3}. Best is trial 50 with value: 124.37741557777078.
[I 2023-08-24 21:38:06,979] Trial 86 finished with value: 126.42284664278968 and parameters: {'n_estimators': 673, 'learning_rate': 0.14306035021762123, 'max_depth': 7, 'reg_lambda': 3.9421380510283615, 'subsample': 0.6503029203741412, 'colsample_bytree': 0.6735333164076037, 'gamma': 0.25989221355957337, 'min_child_weight': 1}. Best is trial 50 with value: 124.37741557777078.
[I 2023-08-24 21:38:07,404] Trial 87 finished with value: 127.29731654400189 and parameters: {'n_estimators': 798, 'learning_rate': 0.26844306954518526, 'max_depth': 9, 'reg_lambda': 0.2633658232348248, 'subsample': 0.6805990259879263, 'colsample_bytree': 0.8074361001735342, 'gamma': 0.20703546007308973, 'min_child_weight': 2}. Best is trial 50 with value: 124.37741557777078.
[I 2023-08-24 21:38:08,137] Trial 88 finished with value: 128.46280575999685 and parameters: {'n_estimators': 927, 'learning_rate': 0.11415885383382905, 'max_depth': 6, 'reg_lambda': 21.525473510806762, 'subsample': 0.7072340108487369, 'colsample_bytree': 0.5829793503225817, 'gamma': 0.2361534423486446, 'min_child_weight': 3}. Best is trial 50 with value: 124.37741557777078.
[I 2023-08-24 21:38:08,739] Trial 89 finished with value: 127.28602703872797 and parameters: {'n_estimators': 759, 'learning_rate': 0.1299849657751349, 'max_depth': 8, 'reg_lambda': 3.738917271436467e-07, 'subsample': 0.5945044574324718, 'colsample_bytree': 0.849902122920978, 'gamma': 0.3523559007838637, 'min_child_weight': 2}. Best is trial 50 with value: 124.37741557777078.
[I 2023-08-24 21:38:09,291] Trial 90 finished with value: 125.4523784339578 and parameters: {'n_estimators': 658, 'learning_rate': 0.20800052943229816, 'max_depth': 6, 'reg_lambda': 0.45507707057474645, 'subsample': 0.6586164581540709, 'colsample_bytree': 0.7318421729668814, 'gamma': 0.17013744730350414, 'min_child_weight': 1}. Best is trial 50 with value: 124.37741557777078.
[I 2023-08-24 21:38:09,721] Trial 91 finished with value: 123.8956308052582 and parameters: {'n_estimators': 637, 'learning_rate': 0.22583163079394733, 'max_depth': 7, 'reg_lambda': 1.8340878558028564, 'subsample': 0.6267325025967294, 'colsample_bytree': 0.7659296394864177, 'gamma': 0.30762619427723126, 'min_child_weight': 4}. Best is trial 91 with value: 123.8956308052582.
[I 2023-08-24 21:38:10,185] Trial 92 finished with value: 127.85876987562972 and parameters: {'n_estimators': 565, 'learning_rate': 0.1873680345617087, 'max_depth': 7, 'reg_lambda': 8.536272933680042, 'subsample': 0.5660131077025174, 'colsample_bytree': 0.7646412761448776, 'gamma': 0.31756755241730106, 'min_child_weight': 4}. Best is trial 91 with value: 123.8956308052582.
[I 2023-08-24 21:38:10,644] Trial 93 finished with value: 128.0737366183879 and parameters: {'n_estimators': 737, 'learning_rate': 0.24214788271188034, 'max_depth': 7, 'reg_lambda': 2.764087111070568, 'subsample': 0.6286314070387166, 'colsample_bytree': 0.718368594103462, 'gamma': 0.2678402131961928, 'min_child_weight': 4}. Best is trial 91 with value: 123.8956308052582.
[I 2023-08-24 21:38:11,175] Trial 94 finished with value: 124.6392831490082 and parameters: {'n_estimators': 709, 'learning_rate': 0.16573328646523308, 'max_depth': 8, 'reg_lambda': 1.7610801964780194, 'subsample': 0.6814160855107918, 'colsample_bytree': 0.7847244683988754, 'gamma': 0.37243774134266094, 'min_child_weight': 4}. Best is trial 91 with value: 123.8956308052582.
[I 2023-08-24 21:38:11,794] Trial 95 finished with value: 126.58350101346032 and parameters: {'n_estimators': 624, 'learning_rate': 0.16293679848061535, 'max_depth': 7, 'reg_lambda': 6.630934995851695, 'subsample': 0.5439611093002767, 'colsample_bytree': 0.756370251624473, 'gamma': 0.430318001886578, 'min_child_weight': 5}. Best is trial 91 with value: 123.8956308052582.
[I 2023-08-24 21:38:12,377] Trial 96 finished with value: 126.92371620159005 and parameters: {'n_estimators': 693, 'learning_rate': 0.14674176872561442, 'max_depth': 9, 'reg_lambda': 1.5018518693070424, 'subsample': 0.7044468371861795, 'colsample_bytree': 0.8193694128251744, 'gamma': 0.36822843471770816, 'min_child_weight': 4}. Best is trial 91 with value: 123.8956308052582.
[I 2023-08-24 21:38:12,873] Trial 97 finished with value: 126.2421739707966 and parameters: {'n_estimators': 651, 'learning_rate': 0.19165136077417266, 'max_depth': 8, 'reg_lambda': 4.146283402475353, 'subsample': 0.6022489145452287, 'colsample_bytree': 0.6887283555095808, 'gamma': 0.34170181665465577, 'min_child_weight': 5}. Best is trial 91 with value: 123.8956308052582.
[I 2023-08-24 21:38:13,640] Trial 98 finished with value: 126.79128251928526 and parameters: {'n_estimators': 720, 'learning_rate': 0.17484553918460102, 'max_depth': 7, 'reg_lambda': 13.335601803812068, 'subsample': 0.6216060636846312, 'colsample_bytree': 0.6389219415297601, 'gamma': 0.22594012123589255, 'min_child_weight': 4}. Best is trial 91 with value: 123.8956308052582.
[I 2023-08-24 21:38:14,077] Trial 99 finished with value: 127.43140816868703 and parameters: {'n_estimators': 590, 'learning_rate': 0.13430577539630337, 'max_depth': 6, 'reg_lambda': 0.7523074930100915, 'subsample': 0.6537035477441075, 'colsample_bytree': 0.7080497097831233, 'gamma': 0.20074564472503695, 'min_child_weight': 1}. Best is trial 91 with value: 123.8956308052582.
Best Hyperparameters: {'n_estimators': 637, 'learning_rate': 0.22583163079394733, 'max_depth': 7, 'reg_lambda': 1.8340878558028564, 'subsample': 0.6267325025967294, 'colsample_bytree': 0.7659296394864177, 'gamma': 0.30762619427723126, 'min_child_weight': 4}
Best Score (MAE): 123.8956308052582
The best parameters appear to be:
Feature Engineer Total Model
¶Let's get a baseline for how well our model is performing.
X = training.drop(columns= targetColumns + ['TruckSID'])
y = target_ActualWeightTotal
X_transformed = data_prep_pipeline.fit_transform(X, y)
X_train, X_test, y_train, y_test = train_test_split(X_transformed, y, test_size=0.3, random_state=1)
model = XGBRegressor()
scores = -cross_val_score(model, X_train, y_train, cv=5, scoring="neg_mean_absolute_error")
print(scores)
print("MAE score: {}".format(np.mean(scores)))
[306.28804635 280.11694204 300.64648174 289.98769003 280.72701119] MAE score: 291.5532342694257
X_train_fe = FE(X_train)
X_test_fe = FE(X_test)
model = XGBRegressor(random_state=42)
model.fit(X_train_fe, y_train)
scores = -cross_val_score(model, X_train_fe, y_train, cv=5, scoring="neg_mean_absolute_error")
print(scores)
print("Mean:", scores.mean())
xgb.plot_importance(model)
plt.show()
[307.162339 278.79971759 297.96208562 289.35171822 288.43561286] Mean: 292.3422946579392
feature_importances = model.feature_importances_
# Create a DataFrame for better manipulation
importance_df = pd.DataFrame({
'Feature': X_train.columns,
'Importance': feature_importances
})
importance_df_sorted = importance_df.sort_values(by='Importance', ascending=False)
importance_df_sorted.to_csv('importance_df_sorted.csv', index=False)
importance_df_sorted
| Feature | Importance | |
|---|---|---|
| 17 | EngineFamily | 0.610977 |
| 18 | TransmissionFamily | 0.193234 |
| 1 | Transmission | 0.058836 |
| 12 | RearWheels | 0.033532 |
| 16 | TagAxle | 0.018143 |
| 13 | RearTires | 0.011649 |
| 27 | sum_WheelBase_per_Engine | 0.007636 |
| 4 | Overhang | 0.007139 |
| 6 | Liner | 0.006805 |
| 21 | WheelBase_squared | 0.006624 |
| 24 | WheelBase_to_Overhang | 0.006261 |
| 0 | Engine | 0.006041 |
| 3 | WheelBase | 0.003928 |
| 11 | FrontSusp | 0.003525 |
| 28 | max_Overhang_per_Transmission | 0.003510 |
| 15 | FrontTires | 0.003481 |
| 22 | Overhang_squared | 0.002605 |
| 23 | Front_to_Rear_Wheels | 0.002384 |
| 9 | RearAxels | 0.002239 |
| 8 | Cab | 0.001929 |
| 14 | FrontWheels | 0.001741 |
| 19 | Engine_Transmission | 0.001671 |
| 10 | RearSusp | 0.001582 |
| 26 | avg_Overhang_per_EngineFamily | 0.001364 |
| 2 | FrontAxlePosition | 0.001027 |
| 5 | FrameRails | 0.001025 |
| 7 | FrontEndExt | 0.000698 |
| 20 | TransmissionFamily_EngineFamily | 0.000417 |
| 25 | avg_WheelBase_per_TransmissionFamily | 0.000000 |
model = xgb.train({"learning_rate": 0.1}, xgb.DMatrix(X_train_fe, label=y_train), 100)
# Specifically for XGBoost, you can use the TreeExplainer which works best for tree-based models
explainer = shap.TreeExplainer(model)
# Compute SHAP values for a sample of the test set
shap_values = explainer.shap_values(X_test_fe)
# Visualize the first prediction's explanation
shap.initjs()
shap.force_plot(explainer.expected_value, shap_values[0], X_test_fe.iloc[0])
shap.summary_plot(shap_values, X_test_fe)
# For a dependence plot for a specific feature
for i in range(5):
feature_name = X_test.columns[i] # get the name of the first feature
shap.dependence_plot(feature_name, shap_values, X_test_fe)
selected_features = importance_df[importance_df['Importance'] >= 0.0001]['Feature'].tolist()
# Subset the dataset
X_train_selected = X_train[selected_features]
X_test_selected = X_test[selected_features]
model = XGBRegressor(random_state=42)
model.fit(X_train_selected, y_train)
y_pred = model.predict(X_test_selected)
scores = -cross_val_score(model, X_train_selected, y_train, cv=5, scoring="neg_mean_absolute_error")
print(scores)
print("Mean:", scores.mean())
[307.162339 278.79971759 297.96208562 289.35171822 288.43561286] Mean: 292.3422946579392
pca = PCA(n_components=0.95) # Retain 95% of the variance
X_train_pca = pca.fit_transform(X_train_selected)
X_val_pca = pca.transform(X_test_selected)
model = XGBRegressor(random_state=42) # or xgb.XGBClassifier() for classification
model.fit(X_train_pca, y_train)
y_pred = model.predict(X_val_pca)
scores = -cross_val_score(model, X_train_selected, y_train, cv=5, scoring="neg_mean_absolute_error")
print(scores)
print("Mean:", scores.mean())
[307.162339 278.79971759 297.96208562 289.35171822 288.43561286] Mean: 292.3422946579392
I was unsuccessful in creating a useful combination of features, adding extra dimensions didn't help the model and reducing existing dimensions didn't help either.
X_train_selected.to_csv('X_train_selected_total.csv', index=False)
X = X_transformed
y = target_ActualWeightTotal
def objective(trial):
# Define the hyperparameter search space
params = {
"n_estimators": trial.suggest_int("n_estimators", 10, 1000),
"learning_rate": trial.suggest_loguniform("learning_rate", 0.01, 0.3),
"max_depth": trial.suggest_int("max_depth", 1, 10),
"reg_lambda": trial.suggest_loguniform("reg_lambda", 1e-9, 100),
"subsample": trial.suggest_float("subsample", 0.1, 1),
"colsample_bytree": trial.suggest_float("colsample_bytree", 0.1, 1),
"gamma": trial.suggest_float("gamma", 0, 1),
"min_child_weight": trial.suggest_int("min_child_weight", 1, 10),
}
# Split the data into train and validation sets
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.3, random_state=42)
# Create and train the XGBRegressor with the suggested hyperparameters
model = XGBRegressor(**params, eval_metric='mae',n_jobs=1,random_state=42)
model.fit(X_train, y_train, eval_set=[(X_val, y_val)], early_stopping_rounds=50, verbose=False)
# Calculate the MAE on the validation set
y_pred = model.predict(X_val)
mae = mean_absolute_error(y_val, y_pred)
return mae
# Create a study object and specify that the goal is to minimize the objective function
study = optuna.create_study(direction="minimize") # We want to minimize the MAE
study.optimize(objective, n_trials=100)
# Get the best hyperparameters and their corresponding MAE
best_params = study.best_params
best_score = study.best_value
print("Best Hyperparameters:", best_params)
print("Best Score (MAE):", best_score)
[I 2023-08-24 21:38:24,047] A new study created in memory with name: no-name-ca58702b-b89b-4fce-89e2-c279cd59fb1b
[I 2023-08-24 21:38:24,223] Trial 0 finished with value: 503.87475155462846 and parameters: {'n_estimators': 183, 'learning_rate': 0.03460310884107015, 'max_depth': 1, 'reg_lambda': 8.459065667850058e-07, 'subsample': 0.52198747648067, 'colsample_bytree': 0.1323594584088897, 'gamma': 0.7020439976815452, 'min_child_weight': 8}. Best is trial 0 with value: 503.87475155462846.
[I 2023-08-24 21:38:24,538] Trial 1 finished with value: 307.60744770347924 and parameters: {'n_estimators': 266, 'learning_rate': 0.08582965349014518, 'max_depth': 4, 'reg_lambda': 1.5603737076552077e-06, 'subsample': 0.2702208246996484, 'colsample_bytree': 0.8367524461810938, 'gamma': 0.8732546095813712, 'min_child_weight': 4}. Best is trial 1 with value: 307.60744770347924.
[I 2023-08-24 21:38:25,186] Trial 2 finished with value: 343.12041482997483 and parameters: {'n_estimators': 457, 'learning_rate': 0.01669704051905962, 'max_depth': 3, 'reg_lambda': 0.007347039573603694, 'subsample': 0.2677775949077245, 'colsample_bytree': 0.8183081675138126, 'gamma': 0.6999865133216777, 'min_child_weight': 7}. Best is trial 1 with value: 307.60744770347924.
[I 2023-08-24 21:38:26,884] Trial 3 finished with value: 320.6799666935611 and parameters: {'n_estimators': 926, 'learning_rate': 0.011794523449264661, 'max_depth': 3, 'reg_lambda': 0.000245475032093513, 'subsample': 0.52867131847601, 'colsample_bytree': 0.970469194695121, 'gamma': 0.36875815825087366, 'min_child_weight': 6}. Best is trial 1 with value: 307.60744770347924.
[I 2023-08-24 21:38:27,700] Trial 4 finished with value: 328.6597552936083 and parameters: {'n_estimators': 657, 'learning_rate': 0.015540552013706887, 'max_depth': 3, 'reg_lambda': 1.5537288821449102e-08, 'subsample': 0.8653645697399861, 'colsample_bytree': 0.6009729167657036, 'gamma': 0.3507613787063616, 'min_child_weight': 1}. Best is trial 1 with value: 307.60744770347924.
[I 2023-08-24 21:38:28,210] Trial 5 finished with value: 291.5666854435611 and parameters: {'n_estimators': 407, 'learning_rate': 0.08658553956738145, 'max_depth': 4, 'reg_lambda': 4.7345639085361925e-07, 'subsample': 0.9529558015145848, 'colsample_bytree': 0.3024623277710474, 'gamma': 0.9482554848421325, 'min_child_weight': 4}. Best is trial 5 with value: 291.5666854435611.
[I 2023-08-24 21:38:28,708] Trial 6 finished with value: 346.3486291227172 and parameters: {'n_estimators': 408, 'learning_rate': 0.050360688497556205, 'max_depth': 6, 'reg_lambda': 30.161356874006668, 'subsample': 0.8617818473208314, 'colsample_bytree': 0.14869488509115344, 'gamma': 0.5853643113687939, 'min_child_weight': 9}. Best is trial 5 with value: 291.5666854435611.
[I 2023-08-24 21:38:29,158] Trial 7 finished with value: 469.22744189822106 and parameters: {'n_estimators': 360, 'learning_rate': 0.01205245158400343, 'max_depth': 3, 'reg_lambda': 2.245099789645699e-07, 'subsample': 0.9502770181408795, 'colsample_bytree': 0.23026170094468384, 'gamma': 0.9991234103420932, 'min_child_weight': 6}. Best is trial 5 with value: 291.5666854435611.
[I 2023-08-24 21:38:29,237] Trial 8 finished with value: 3861.7185298429627 and parameters: {'n_estimators': 36, 'learning_rate': 0.04185734554297638, 'max_depth': 5, 'reg_lambda': 1.8922810517443854e-07, 'subsample': 0.7328108166709979, 'colsample_bytree': 0.3805326370500097, 'gamma': 0.8220061575608637, 'min_child_weight': 10}. Best is trial 5 with value: 291.5666854435611.
[I 2023-08-24 21:38:30,167] Trial 9 finished with value: 297.8410503089578 and parameters: {'n_estimators': 578, 'learning_rate': 0.03274400221791503, 'max_depth': 4, 'reg_lambda': 7.308456985962609e-07, 'subsample': 0.38790819667753573, 'colsample_bytree': 0.44358230100438323, 'gamma': 0.7147952401286152, 'min_child_weight': 9}. Best is trial 5 with value: 291.5666854435611.
[I 2023-08-24 21:38:30,690] Trial 10 finished with value: 287.42760892238664 and parameters: {'n_estimators': 718, 'learning_rate': 0.22933266083251272, 'max_depth': 9, 'reg_lambda': 1.6633234757988981e-09, 'subsample': 0.7049368159235926, 'colsample_bytree': 0.31842864185601794, 'gamma': 0.04826145816750316, 'min_child_weight': 3}. Best is trial 10 with value: 287.42760892238664.
[I 2023-08-24 21:38:31,141] Trial 11 finished with value: 285.3675060512437 and parameters: {'n_estimators': 772, 'learning_rate': 0.2600216176334391, 'max_depth': 10, 'reg_lambda': 1.1270623281253282e-09, 'subsample': 0.7310007031926059, 'colsample_bytree': 0.29059012407380425, 'gamma': 0.011619588168750201, 'min_child_weight': 3}. Best is trial 11 with value: 285.3675060512437.
[I 2023-08-24 21:38:31,659] Trial 12 finished with value: 281.53009263814545 and parameters: {'n_estimators': 802, 'learning_rate': 0.2868624965544731, 'max_depth': 10, 'reg_lambda': 1.2858055748963844e-09, 'subsample': 0.6929442034265967, 'colsample_bytree': 0.4779897434807927, 'gamma': 0.07944427798182607, 'min_child_weight': 2}. Best is trial 12 with value: 281.53009263814545.
[I 2023-08-24 21:38:32,233] Trial 13 finished with value: 283.4363463279282 and parameters: {'n_estimators': 886, 'learning_rate': 0.27348498062253, 'max_depth': 10, 'reg_lambda': 1.1067329544854548e-09, 'subsample': 0.6660951503139101, 'colsample_bytree': 0.5241730085569677, 'gamma': 0.009341115633953123, 'min_child_weight': 1}. Best is trial 12 with value: 281.53009263814545.
[I 2023-08-24 21:38:32,530] Trial 14 finished with value: 276.23720752322106 and parameters: {'n_estimators': 998, 'learning_rate': 0.29915511444217346, 'max_depth': 8, 'reg_lambda': 1.0411830447870452e-09, 'subsample': 0.6306025795570318, 'colsample_bytree': 0.5362570698131197, 'gamma': 0.11336897948068983, 'min_child_weight': 1}. Best is trial 14 with value: 276.23720752322106.
[I 2023-08-24 21:38:32,779] Trial 15 finished with value: 311.20728461508185 and parameters: {'n_estimators': 983, 'learning_rate': 0.18033823697611487, 'max_depth': 8, 'reg_lambda': 1.6906308168756473e-08, 'subsample': 0.11913104692330123, 'colsample_bytree': 0.59110148979193, 'gamma': 0.16026130463117214, 'min_child_weight': 2}. Best is trial 14 with value: 276.23720752322106.
[I 2023-08-24 21:38:33,305] Trial 16 finished with value: 278.1858149992128 and parameters: {'n_estimators': 825, 'learning_rate': 0.1440033863673198, 'max_depth': 7, 'reg_lambda': 2.3297585797411725e-05, 'subsample': 0.5886233633918191, 'colsample_bytree': 0.48539098141685694, 'gamma': 0.17946885570146187, 'min_child_weight': 1}. Best is trial 14 with value: 276.23720752322106.
[I 2023-08-24 21:38:33,873] Trial 17 finished with value: 281.16425436870276 and parameters: {'n_estimators': 998, 'learning_rate': 0.1551142492739507, 'max_depth': 7, 'reg_lambda': 3.9446199422950445e-05, 'subsample': 0.586356888319233, 'colsample_bytree': 0.6430166238301531, 'gamma': 0.22251866811655746, 'min_child_weight': 4}. Best is trial 14 with value: 276.23720752322106.
[I 2023-08-24 21:38:34,333] Trial 18 finished with value: 285.17326850795024 and parameters: {'n_estimators': 851, 'learning_rate': 0.1420885585516838, 'max_depth': 7, 'reg_lambda': 2.6325353995211063e-05, 'subsample': 0.45727001646082466, 'colsample_bytree': 0.4227026399606507, 'gamma': 0.1776967524919919, 'min_child_weight': 1}. Best is trial 14 with value: 276.23720752322106.
[I 2023-08-24 21:38:34,879] Trial 19 finished with value: 278.1543854297859 and parameters: {'n_estimators': 587, 'learning_rate': 0.10801186149526924, 'max_depth': 8, 'reg_lambda': 0.005418123253574687, 'subsample': 0.6040591446928526, 'colsample_bytree': 0.6754565660864857, 'gamma': 0.2991834812471706, 'min_child_weight': 2}. Best is trial 14 with value: 276.23720752322106.
[I 2023-08-24 21:38:35,314] Trial 20 finished with value: 279.3540641727015 and parameters: {'n_estimators': 586, 'learning_rate': 0.10741911185263471, 'max_depth': 8, 'reg_lambda': 0.0032654706491795143, 'subsample': 0.6287943831753671, 'colsample_bytree': 0.6651570664150814, 'gamma': 0.3254494245690571, 'min_child_weight': 5}. Best is trial 14 with value: 276.23720752322106.
[I 2023-08-24 21:38:35,704] Trial 21 finished with value: 279.09550510665935 and parameters: {'n_estimators': 688, 'learning_rate': 0.180524699839149, 'max_depth': 8, 'reg_lambda': 0.025498723376927247, 'subsample': 0.5962268645993603, 'colsample_bytree': 0.5193985174899024, 'gamma': 0.26935666589792595, 'min_child_weight': 2}. Best is trial 14 with value: 276.23720752322106.
[I 2023-08-24 21:38:36,127] Trial 22 finished with value: 279.2226587098552 and parameters: {'n_estimators': 546, 'learning_rate': 0.11594188615771885, 'max_depth': 7, 'reg_lambda': 0.06382145996196928, 'subsample': 0.4889828189974613, 'colsample_bytree': 0.6893361108761109, 'gamma': 0.13851497049769423, 'min_child_weight': 1}. Best is trial 14 with value: 276.23720752322106.
[I 2023-08-24 21:38:36,461] Trial 23 finished with value: 282.2157748051795 and parameters: {'n_estimators': 803, 'learning_rate': 0.20950291448791833, 'max_depth': 6, 'reg_lambda': 9.010695437000935e-06, 'subsample': 0.5916354416847946, 'colsample_bytree': 0.5362615138704698, 'gamma': 0.2642117289027355, 'min_child_weight': 3}. Best is trial 14 with value: 276.23720752322106.
[I 2023-08-24 21:38:37,069] Trial 24 finished with value: 280.09062106423175 and parameters: {'n_estimators': 935, 'learning_rate': 0.141507559008247, 'max_depth': 9, 'reg_lambda': 0.0004559781334346122, 'subsample': 0.7874516798569382, 'colsample_bytree': 0.7302482812736135, 'gamma': 0.4402891737261523, 'min_child_weight': 2}. Best is trial 14 with value: 276.23720752322106.
[I 2023-08-24 21:38:37,528] Trial 25 finished with value: 275.5025570194427 and parameters: {'n_estimators': 623, 'learning_rate': 0.19561243660747582, 'max_depth': 9, 'reg_lambda': 0.17273368678437587, 'subsample': 0.6377773512823068, 'colsample_bytree': 0.5801202643323335, 'gamma': 0.12233733494713053, 'min_child_weight': 1}. Best is trial 25 with value: 275.5025570194427.
[I 2023-08-24 21:38:37,980] Trial 26 finished with value: 277.9574457355951 and parameters: {'n_estimators': 614, 'learning_rate': 0.217000386191815, 'max_depth': 9, 'reg_lambda': 0.268216139801137, 'subsample': 0.6552491880379386, 'colsample_bytree': 0.5427747713022606, 'gamma': 0.09094291190515338, 'min_child_weight': 3}. Best is trial 25 with value: 275.5025570194427.
[I 2023-08-24 21:38:38,460] Trial 27 finished with value: 281.32694918923175 and parameters: {'n_estimators': 645, 'learning_rate': 0.20896863980045055, 'max_depth': 9, 'reg_lambda': 0.357614454303984, 'subsample': 0.6559706797109693, 'colsample_bytree': 0.5674021290332665, 'gamma': 0.10079674419032192, 'min_child_weight': 3}. Best is trial 25 with value: 275.5025570194427.
[I 2023-08-24 21:38:38,925] Trial 28 finished with value: 281.68758609493074 and parameters: {'n_estimators': 494, 'learning_rate': 0.26667080219581796, 'max_depth': 9, 'reg_lambda': 0.3955741076271662, 'subsample': 0.8035261084722158, 'colsample_bytree': 0.4085984459335157, 'gamma': 0.09049864514427342, 'min_child_weight': 5}. Best is trial 25 with value: 275.5025570194427.
[I 2023-08-24 21:38:39,425] Trial 29 finished with value: 275.6900705486461 and parameters: {'n_estimators': 282, 'learning_rate': 0.20936031602434668, 'max_depth': 10, 'reg_lambda': 3.0367839991884775, 'subsample': 0.5322684529622277, 'colsample_bytree': 0.5876945827454486, 'gamma': 0.23307106313059275, 'min_child_weight': 2}. Best is trial 25 with value: 275.5025570194427.
[I 2023-08-24 21:38:39,601] Trial 30 finished with value: 401.44545640152705 and parameters: {'n_estimators': 177, 'learning_rate': 0.17883057170141184, 'max_depth': 1, 'reg_lambda': 4.643900813619784, 'subsample': 0.540534835334475, 'colsample_bytree': 0.6246675512139492, 'gamma': 0.21824306753568923, 'min_child_weight': 1}. Best is trial 25 with value: 275.5025570194427.
[I 2023-08-24 21:38:40,344] Trial 31 finished with value: 277.867999252204 and parameters: {'n_estimators': 239, 'learning_rate': 0.2899816404555501, 'max_depth': 10, 'reg_lambda': 96.30202645971627, 'subsample': 0.6506099735441224, 'colsample_bytree': 0.5510752754114139, 'gamma': 0.11159469814966103, 'min_child_weight': 2}. Best is trial 25 with value: 275.5025570194427.
[I 2023-08-24 21:38:40,693] Trial 32 finished with value: 284.09638450488035 and parameters: {'n_estimators': 261, 'learning_rate': 0.2850161025138182, 'max_depth': 10, 'reg_lambda': 13.225471472498723, 'subsample': 0.48507616764254324, 'colsample_bytree': 0.4786306016311545, 'gamma': 0.1478015903765593, 'min_child_weight': 2}. Best is trial 25 with value: 275.5025570194427.
[I 2023-08-24 21:38:41,181] Trial 33 finished with value: 284.02449769757555 and parameters: {'n_estimators': 154, 'learning_rate': 0.22148739899871925, 'max_depth': 10, 'reg_lambda': 75.89645312218785, 'subsample': 0.5460212213871909, 'colsample_bytree': 0.5929846413449087, 'gamma': 0.23003173012123695, 'min_child_weight': 2}. Best is trial 25 with value: 275.5025570194427.
[I 2023-08-24 21:38:41,542] Trial 34 finished with value: 276.2260803683879 and parameters: {'n_estimators': 297, 'learning_rate': 0.25076774650315714, 'max_depth': 9, 'reg_lambda': 2.65428437367735, 'subsample': 0.44007934933988513, 'colsample_bytree': 0.721766422438874, 'gamma': 0.0071927918546931535, 'min_child_weight': 4}. Best is trial 25 with value: 275.5025570194427.
[I 2023-08-24 21:38:41,990] Trial 35 finished with value: 282.4661253345403 and parameters: {'n_estimators': 350, 'learning_rate': 0.1767013516396914, 'max_depth': 9, 'reg_lambda': 3.1742554253117876, 'subsample': 0.4165778181819284, 'colsample_bytree': 0.7764575819410393, 'gamma': 0.020781583381783625, 'min_child_weight': 7}. Best is trial 25 with value: 275.5025570194427.
[I 2023-08-24 21:38:42,273] Trial 36 finished with value: 290.15312721386965 and parameters: {'n_estimators': 103, 'learning_rate': 0.07721483024568497, 'max_depth': 8, 'reg_lambda': 1.3370912181105128, 'subsample': 0.3737258696517997, 'colsample_bytree': 0.717477720803216, 'gamma': 0.05810721329475632, 'min_child_weight': 4}. Best is trial 25 with value: 275.5025570194427.
[I 2023-08-24 21:38:42,647] Trial 37 finished with value: 272.37475032470087 and parameters: {'n_estimators': 305, 'learning_rate': 0.2381898419257861, 'max_depth': 9, 'reg_lambda': 1.9065146239404211, 'subsample': 0.5221174207401648, 'colsample_bytree': 0.8762984254488593, 'gamma': 0.004010513787127894, 'min_child_weight': 4}. Best is trial 37 with value: 272.37475032470087.
[I 2023-08-24 21:38:43,136] Trial 38 finished with value: 280.12048001613664 and parameters: {'n_estimators': 307, 'learning_rate': 0.21337407547227186, 'max_depth': 9, 'reg_lambda': 11.547755675294265, 'subsample': 0.5183498856410114, 'colsample_bytree': 0.8963812516815987, 'gamma': 0.0451076267933046, 'min_child_weight': 4}. Best is trial 37 with value: 272.37475032470087.
[I 2023-08-24 21:38:43,433] Trial 39 finished with value: 371.4022896331864 and parameters: {'n_estimators': 443, 'learning_rate': 0.24336880351803475, 'max_depth': 1, 'reg_lambda': 1.6507534271268642, 'subsample': 0.4386022675478367, 'colsample_bytree': 0.8592784742251566, 'gamma': 0.011470087351944016, 'min_child_weight': 5}. Best is trial 37 with value: 272.37475032470087.
[I 2023-08-24 21:38:43,858] Trial 40 finished with value: 280.6790762259918 and parameters: {'n_estimators': 220, 'learning_rate': 0.16100496028900804, 'max_depth': 5, 'reg_lambda': 24.550129422102575, 'subsample': 0.33417657433368564, 'colsample_bytree': 0.7689380635912174, 'gamma': 0.40209068686922367, 'min_child_weight': 6}. Best is trial 37 with value: 272.37475032470087.
[I 2023-08-24 21:38:44,252] Trial 41 finished with value: 277.0701698284005 and parameters: {'n_estimators': 338, 'learning_rate': 0.25217257527915415, 'max_depth': 8, 'reg_lambda': 0.9693110898244365, 'subsample': 0.5153604543915742, 'colsample_bytree': 0.9952695289511038, 'gamma': 0.12781663359684567, 'min_child_weight': 1}. Best is trial 37 with value: 272.37475032470087.
[I 2023-08-24 21:38:44,590] Trial 42 finished with value: 278.73267155029913 and parameters: {'n_estimators': 109, 'learning_rate': 0.20078413896214647, 'max_depth': 9, 'reg_lambda': 0.10674894137170153, 'subsample': 0.5508664522137815, 'colsample_bytree': 0.594777992016356, 'gamma': 0.06605277698912926, 'min_child_weight': 3}. Best is trial 37 with value: 272.37475032470087.
[I 2023-08-24 21:38:45,015] Trial 43 finished with value: 282.07098158060455 and parameters: {'n_estimators': 400, 'learning_rate': 0.24657782447673263, 'max_depth': 10, 'reg_lambda': 6.103356665635787, 'subsample': 0.4809487764376245, 'colsample_bytree': 0.621151949442091, 'gamma': 0.0018296475595689227, 'min_child_weight': 7}. Best is trial 37 with value: 272.37475032470087.
[I 2023-08-24 21:38:45,374] Trial 44 finished with value: 275.9902749626102 and parameters: {'n_estimators': 293, 'learning_rate': 0.29054059816715466, 'max_depth': 9, 'reg_lambda': 2.4690770909365156, 'subsample': 0.5608948576972114, 'colsample_bytree': 0.8219089347358931, 'gamma': 0.18527725422289396, 'min_child_weight': 4}. Best is trial 37 with value: 272.37475032470087.
[I 2023-08-24 21:38:45,911] Trial 45 finished with value: 278.2358324641845 and parameters: {'n_estimators': 503, 'learning_rate': 0.18885857508387224, 'max_depth': 10, 'reg_lambda': 2.885304058807542, 'subsample': 0.44657598106617113, 'colsample_bytree': 0.9350757762228091, 'gamma': 0.19604221125315546, 'min_child_weight': 5}. Best is trial 37 with value: 272.37475032470087.
[I 2023-08-24 21:38:46,262] Trial 46 finished with value: 273.3197491931675 and parameters: {'n_estimators': 303, 'learning_rate': 0.2381434832850326, 'max_depth': 9, 'reg_lambda': 0.9170187868915959, 'subsample': 0.5665630707800245, 'colsample_bytree': 0.8207941113034839, 'gamma': 0.1644446202191974, 'min_child_weight': 4}. Best is trial 37 with value: 272.37475032470087.
[I 2023-08-24 21:38:46,805] Trial 47 finished with value: 275.7896135075567 and parameters: {'n_estimators': 448, 'learning_rate': 0.12639956482211503, 'max_depth': 10, 'reg_lambda': 0.6496140036210473, 'subsample': 0.5520082526060971, 'colsample_bytree': 0.8485012737783704, 'gamma': 0.26647701245627253, 'min_child_weight': 4}. Best is trial 37 with value: 272.37475032470087.
[I 2023-08-24 21:38:47,370] Trial 48 finished with value: 280.731460071631 and parameters: {'n_estimators': 437, 'learning_rate': 0.13011883130369223, 'max_depth': 10, 'reg_lambda': 0.6341484468576574, 'subsample': 0.5163982502138322, 'colsample_bytree': 0.8669448200804477, 'gamma': 0.35247544961367067, 'min_child_weight': 4}. Best is trial 37 with value: 272.37475032470087.
[I 2023-08-24 21:38:47,794] Trial 49 finished with value: 312.37157096190174 and parameters: {'n_estimators': 390, 'learning_rate': 0.17056927086730603, 'max_depth': 2, 'reg_lambda': 0.09748189311358584, 'subsample': 0.6967247520079654, 'colsample_bytree': 0.9355849762069983, 'gamma': 0.29175686180344945, 'min_child_weight': 3}. Best is trial 37 with value: 272.37475032470087.
[I 2023-08-24 21:38:48,374] Trial 50 finished with value: 279.9216376239767 and parameters: {'n_estimators': 477, 'learning_rate': 0.15908176645190095, 'max_depth': 10, 'reg_lambda': 10.968270358718243, 'subsample': 0.563529129991677, 'colsample_bytree': 0.7800248824971729, 'gamma': 0.16169786751097523, 'min_child_weight': 6}. Best is trial 37 with value: 272.37475032470087.
[I 2023-08-24 21:38:49,090] Trial 51 finished with value: 276.920177699937 and parameters: {'n_estimators': 300, 'learning_rate': 0.19396407129958174, 'max_depth': 9, 'reg_lambda': 1.1947078614301847, 'subsample': 0.6153196415320673, 'colsample_bytree': 0.8205224942916335, 'gamma': 0.2369218093813788, 'min_child_weight': 5}. Best is trial 37 with value: 272.37475032470087.
[I 2023-08-24 21:38:49,513] Trial 52 finished with value: 278.9103173705132 and parameters: {'n_estimators': 534, 'learning_rate': 0.23405464391550176, 'max_depth': 9, 'reg_lambda': 0.2587732282051522, 'subsample': 0.5867201264325743, 'colsample_bytree': 0.8170601578030543, 'gamma': 0.2051491288034059, 'min_child_weight': 4}. Best is trial 37 with value: 272.37475032470087.
[I 2023-08-24 21:38:50,006] Trial 53 finished with value: 280.68965360319584 and parameters: {'n_estimators': 221, 'learning_rate': 0.29977928938080906, 'max_depth': 10, 'reg_lambda': 0.7368019200474047, 'subsample': 0.5605163883806396, 'colsample_bytree': 0.914299865442843, 'gamma': 0.1729820449064228, 'min_child_weight': 4}. Best is trial 37 with value: 272.37475032470087.
[I 2023-08-24 21:38:50,479] Trial 54 finished with value: 280.515519226228 and parameters: {'n_estimators': 370, 'learning_rate': 0.2270103421041519, 'max_depth': 8, 'reg_lambda': 6.735108579951337, 'subsample': 0.5192029089464596, 'colsample_bytree': 0.8599881758342562, 'gamma': 0.25368036912755343, 'min_child_weight': 6}. Best is trial 37 with value: 272.37475032470087.
[I 2023-08-24 21:38:51,096] Trial 55 finished with value: 280.62058947969143 and parameters: {'n_estimators': 328, 'learning_rate': 0.2005036917735287, 'max_depth': 9, 'reg_lambda': 37.987044795428524, 'subsample': 0.6182122298207031, 'colsample_bytree': 0.9717514784598534, 'gamma': 0.3131643481474765, 'min_child_weight': 4}. Best is trial 37 with value: 272.37475032470087.
[I 2023-08-24 21:38:51,527] Trial 56 finished with value: 280.691068019915 and parameters: {'n_estimators': 274, 'learning_rate': 0.26578042327756657, 'max_depth': 7, 'reg_lambda': 0.025467347544414282, 'subsample': 0.5705724575600597, 'colsample_bytree': 0.8094067515837664, 'gamma': 0.13535312221925078, 'min_child_weight': 5}. Best is trial 37 with value: 272.37475032470087.
[I 2023-08-24 21:38:51,980] Trial 57 finished with value: 276.5312536897828 and parameters: {'n_estimators': 434, 'learning_rate': 0.16033977873245345, 'max_depth': 6, 'reg_lambda': 0.16580531822920738, 'subsample': 0.49462009150775604, 'colsample_bytree': 0.875006180201521, 'gamma': 0.19996875764557964, 'min_child_weight': 3}. Best is trial 37 with value: 272.37475032470087.
[I 2023-08-24 21:38:52,765] Trial 58 finished with value: 277.600112415381 and parameters: {'n_estimators': 723, 'learning_rate': 0.12886378456576864, 'max_depth': 8, 'reg_lambda': 0.5113504925817178, 'subsample': 0.6780280202771731, 'colsample_bytree': 0.830722786112942, 'gamma': 0.2714369694016817, 'min_child_weight': 3}. Best is trial 37 with value: 272.37475032470087.
[I 2023-08-24 21:38:53,337] Trial 59 finished with value: 276.7184363684666 and parameters: {'n_estimators': 548, 'learning_rate': 0.1939476193602629, 'max_depth': 10, 'reg_lambda': 1.8322501897740024, 'subsample': 0.6316176792745133, 'colsample_bytree': 0.7663258801674331, 'gamma': 0.057917606483270684, 'min_child_weight': 7}. Best is trial 37 with value: 272.37475032470087.
[I 2023-08-24 21:38:53,431] Trial 60 finished with value: 344.3452628109257 and parameters: {'n_estimators': 34, 'learning_rate': 0.2396604851252634, 'max_depth': 4, 'reg_lambda': 19.120124180084584, 'subsample': 0.7245851051448065, 'colsample_bytree': 0.6586506874465721, 'gamma': 0.11174226855939313, 'min_child_weight': 10}. Best is trial 37 with value: 272.37475032470087.
[I 2023-08-24 21:38:53,806] Trial 61 finished with value: 276.30949159713475 and parameters: {'n_estimators': 311, 'learning_rate': 0.2604652597224021, 'max_depth': 9, 'reg_lambda': 2.6265534445482395, 'subsample': 0.4646861384344494, 'colsample_bytree': 0.6995432817694202, 'gamma': 0.0376418404015602, 'min_child_weight': 4}. Best is trial 37 with value: 272.37475032470087.
[I 2023-08-24 21:38:54,324] Trial 62 finished with value: 278.39791109099497 and parameters: {'n_estimators': 257, 'learning_rate': 0.22775122304063786, 'max_depth': 9, 'reg_lambda': 4.5376733552435615, 'subsample': 0.5337099257377079, 'colsample_bytree': 0.7346200213351805, 'gamma': 0.07798271611081192, 'min_child_weight': 4}. Best is trial 37 with value: 272.37475032470087.
[I 2023-08-24 21:38:54,778] Trial 63 finished with value: 278.53257955171597 and parameters: {'n_estimators': 204, 'learning_rate': 0.2998830687192381, 'max_depth': 10, 'reg_lambda': 0.8547176909335645, 'subsample': 0.5705780825444046, 'colsample_bytree': 0.8432622020953461, 'gamma': 0.15308439016523745, 'min_child_weight': 5}. Best is trial 37 with value: 272.37475032470087.
[I 2023-08-24 21:38:55,155] Trial 64 finished with value: 280.69599510980794 and parameters: {'n_estimators': 136, 'learning_rate': 0.2696551523598431, 'max_depth': 8, 'reg_lambda': 0.05559420946866001, 'subsample': 0.4980412278037374, 'colsample_bytree': 0.8028210058065437, 'gamma': 0.08447852354920836, 'min_child_weight': 9}. Best is trial 37 with value: 272.37475032470087.
[I 2023-08-24 21:38:55,620] Trial 65 finished with value: 278.3085900602172 and parameters: {'n_estimators': 287, 'learning_rate': 0.17264878031439726, 'max_depth': 9, 'reg_lambda': 0.25195206569271933, 'subsample': 0.6036516408069978, 'colsample_bytree': 0.648795891292979, 'gamma': 0.035072697553006946, 'min_child_weight': 3}. Best is trial 37 with value: 272.37475032470087.
[I 2023-08-24 21:38:56,066] Trial 66 finished with value: 275.2802119411209 and parameters: {'n_estimators': 370, 'learning_rate': 0.24421624392569272, 'max_depth': 10, 'reg_lambda': 6.476931872900835, 'subsample': 0.46548126943654394, 'colsample_bytree': 0.748356747847216, 'gamma': 0.1768609043088442, 'min_child_weight': 2}. Best is trial 37 with value: 272.37475032470087.
[I 2023-08-24 21:38:56,687] Trial 67 finished with value: 276.49561530817067 and parameters: {'n_estimators': 365, 'learning_rate': 0.20975576255138204, 'max_depth': 10, 'reg_lambda': 8.09185023167126, 'subsample': 0.5330742753603226, 'colsample_bytree': 0.7401663293480729, 'gamma': 0.19267392689791513, 'min_child_weight': 2}. Best is trial 37 with value: 272.37475032470087.
[I 2023-08-24 21:38:57,493] Trial 68 finished with value: 279.0145881710485 and parameters: {'n_estimators': 418, 'learning_rate': 0.14753215782247736, 'max_depth': 10, 'reg_lambda': 36.574045919385604, 'subsample': 0.5793408093035645, 'colsample_bytree': 0.8967608390724144, 'gamma': 0.2298237081623546, 'min_child_weight': 1}. Best is trial 37 with value: 272.37475032470087.
[I 2023-08-24 21:38:57,939] Trial 69 finished with value: 277.97393168490237 and parameters: {'n_estimators': 467, 'learning_rate': 0.1859597984694104, 'max_depth': 10, 'reg_lambda': 0.5688452516316728, 'subsample': 0.47386602687237295, 'colsample_bytree': 0.8411797130159789, 'gamma': 0.12957135233406117, 'min_child_weight': 2}. Best is trial 37 with value: 272.37475032470087.
[I 2023-08-24 21:38:58,449] Trial 70 finished with value: 278.87633078164356 and parameters: {'n_estimators': 382, 'learning_rate': 0.22446331050966178, 'max_depth': 9, 'reg_lambda': 4.999293896368495, 'subsample': 0.6244906487431542, 'colsample_bytree': 0.6724936876150527, 'gamma': 0.17943828745426213, 'min_child_weight': 8}. Best is trial 37 with value: 272.37475032470087.
[I 2023-08-24 21:38:58,804] Trial 71 finished with value: 273.32235171993074 and parameters: {'n_estimators': 331, 'learning_rate': 0.25292176115681114, 'max_depth': 9, 'reg_lambda': 2.040776371516244, 'subsample': 0.44933132543947985, 'colsample_bytree': 0.697632301572057, 'gamma': 0.11003329443230392, 'min_child_weight': 3}. Best is trial 37 with value: 272.37475032470087.
[I 2023-08-24 21:38:59,216] Trial 72 finished with value: 273.0269132753464 and parameters: {'n_estimators': 332, 'learning_rate': 0.2720516988038279, 'max_depth': 9, 'reg_lambda': 1.5673274956831946, 'subsample': 0.5004402696324488, 'colsample_bytree': 0.6886060600229759, 'gamma': 0.10616719767043592, 'min_child_weight': 3}. Best is trial 37 with value: 272.37475032470087.
[I 2023-08-24 21:38:59,615] Trial 73 finished with value: 271.38115701747483 and parameters: {'n_estimators': 341, 'learning_rate': 0.2570556611802518, 'max_depth': 10, 'reg_lambda': 1.2436839685308987, 'subsample': 0.46378166789894804, 'colsample_bytree': 0.6956806584041942, 'gamma': 0.12525719229146448, 'min_child_weight': 2}. Best is trial 73 with value: 271.38115701747483.
[I 2023-08-24 21:39:00,085] Trial 74 finished with value: 277.38662158572106 and parameters: {'n_estimators': 332, 'learning_rate': 0.2618987838958368, 'max_depth': 9, 'reg_lambda': 15.747295512252679, 'subsample': 0.4086118948092844, 'colsample_bytree': 0.6901931626640395, 'gamma': 0.10895990733481478, 'min_child_weight': 2}. Best is trial 73 with value: 271.38115701747483.
[I 2023-08-24 21:39:00,395] Trial 75 finished with value: 276.73891466270464 and parameters: {'n_estimators': 242, 'learning_rate': 0.2317299999763361, 'max_depth': 8, 'reg_lambda': 1.7083895351429395, 'subsample': 0.4606362096025495, 'colsample_bytree': 0.6454192030733011, 'gamma': 0.07762021516038076, 'min_child_weight': 2}. Best is trial 73 with value: 271.38115701747483.
[I 2023-08-24 21:39:00,851] Trial 76 finished with value: 272.63177075133814 and parameters: {'n_estimators': 345, 'learning_rate': 0.2095150106503206, 'max_depth': 10, 'reg_lambda': 7.352843140606574, 'subsample': 0.4993525712050281, 'colsample_bytree': 0.5678296111168029, 'gamma': 0.1486634801438333, 'min_child_weight': 1}. Best is trial 73 with value: 271.38115701747483.
[I 2023-08-24 21:39:01,229] Trial 77 finished with value: 274.3205793942853 and parameters: {'n_estimators': 352, 'learning_rate': 0.263078021990381, 'max_depth': 10, 'reg_lambda': 7.845257048289091, 'subsample': 0.4969711434932156, 'colsample_bytree': 0.6294055515971203, 'gamma': 0.14182710091773962, 'min_child_weight': 1}. Best is trial 73 with value: 271.38115701747483.
[I 2023-08-24 21:39:01,878] Trial 78 finished with value: 278.7768751475913 and parameters: {'n_estimators': 332, 'learning_rate': 0.2652613726156818, 'max_depth': 10, 'reg_lambda': 48.43127167247482, 'subsample': 0.4992120706329156, 'colsample_bytree': 0.6259854631386206, 'gamma': 0.16182496492910714, 'min_child_weight': 1}. Best is trial 73 with value: 271.38115701747483.
[I 2023-08-24 21:39:02,283] Trial 79 finished with value: 279.3269282804628 and parameters: {'n_estimators': 356, 'learning_rate': 0.24579136108561317, 'max_depth': 10, 'reg_lambda': 10.103744253525749, 'subsample': 0.42090434950837374, 'colsample_bytree': 0.6928602016606208, 'gamma': 0.10044304317745088, 'min_child_weight': 3}. Best is trial 73 with value: 271.38115701747483.
[I 2023-08-24 21:39:03,058] Trial 80 finished with value: 277.69627061358625 and parameters: {'n_estimators': 413, 'learning_rate': 0.27798203799959964, 'max_depth': 10, 'reg_lambda': 22.745069322038372, 'subsample': 0.45791378349219974, 'colsample_bytree': 0.7454326633554741, 'gamma': 0.042544109852274734, 'min_child_weight': 1}. Best is trial 73 with value: 271.38115701747483.
[I 2023-08-24 21:39:03,492] Trial 81 finished with value: 273.3138566101228 and parameters: {'n_estimators': 185, 'learning_rate': 0.21268464596233164, 'max_depth': 9, 'reg_lambda': 5.777670982375906, 'subsample': 0.47629347247967335, 'colsample_bytree': 0.5681639218994864, 'gamma': 0.1411599347843201, 'min_child_weight': 1}. Best is trial 73 with value: 271.38115701747483.
[I 2023-08-24 21:39:03,939] Trial 82 finished with value: 274.84975450645464 and parameters: {'n_estimators': 178, 'learning_rate': 0.21544079030424487, 'max_depth': 9, 'reg_lambda': 6.2634201220314845, 'subsample': 0.49352189672070906, 'colsample_bytree': 0.5582464826168826, 'gamma': 0.14408931798994673, 'min_child_weight': 1}. Best is trial 73 with value: 271.38115701747483.
[I 2023-08-24 21:39:04,519] Trial 83 finished with value: 280.9565294395466 and parameters: {'n_estimators': 180, 'learning_rate': 0.2116284669916624, 'max_depth': 9, 'reg_lambda': 63.35506813992003, 'subsample': 0.5037962276182846, 'colsample_bytree': 0.6153697372190052, 'gamma': 0.1433919958208231, 'min_child_weight': 1}. Best is trial 73 with value: 271.38115701747483.
[I 2023-08-24 21:39:04,861] Trial 84 finished with value: 278.47587497048175 and parameters: {'n_estimators': 93, 'learning_rate': 0.18249917800062776, 'max_depth': 9, 'reg_lambda': 1.2777079510533818, 'subsample': 0.4890350430784967, 'colsample_bytree': 0.5704969571043933, 'gamma': 0.06744215095494857, 'min_child_weight': 1}. Best is trial 73 with value: 271.38115701747483.
[I 2023-08-24 21:39:05,219] Trial 85 finished with value: 280.32431714420653 and parameters: {'n_estimators': 249, 'learning_rate': 0.22138049806343044, 'max_depth': 5, 'reg_lambda': 18.785482335330038, 'subsample': 0.4338373186930039, 'colsample_bytree': 0.5512246110927226, 'gamma': 0.11309111415838193, 'min_child_weight': 1}. Best is trial 73 with value: 271.38115701747483.
[I 2023-08-24 21:39:05,498] Trial 86 finished with value: 276.20092957926636 and parameters: {'n_estimators': 156, 'learning_rate': 0.27745835831625093, 'max_depth': 8, 'reg_lambda': 3.8747812301855515, 'subsample': 0.40013898041541524, 'colsample_bytree': 0.5245922972675459, 'gamma': 0.14609357061510322, 'min_child_weight': 1}. Best is trial 73 with value: 271.38115701747483.
[I 2023-08-24 21:39:05,720] Trial 87 finished with value: 285.44692862484254 and parameters: {'n_estimators': 77, 'learning_rate': 0.2031602205093676, 'max_depth': 7, 'reg_lambda': 9.407830808645649, 'subsample': 0.43484863334767204, 'colsample_bytree': 0.6080486624078841, 'gamma': 0.02471123226172514, 'min_child_weight': 2}. Best is trial 73 with value: 271.38115701747483.
[I 2023-08-24 21:39:06,329] Trial 88 finished with value: 281.3319205565176 and parameters: {'n_estimators': 208, 'learning_rate': 0.1666949035699387, 'max_depth': 8, 'reg_lambda': 30.875704032065002, 'subsample': 0.3768758500117065, 'colsample_bytree': 0.6309724328962547, 'gamma': 0.08366715208850783, 'min_child_weight': 2}. Best is trial 73 with value: 271.38115701747483.
[I 2023-08-24 21:39:06,709] Trial 89 finished with value: 269.2243781486146 and parameters: {'n_estimators': 315, 'learning_rate': 0.29982249490952867, 'max_depth': 9, 'reg_lambda': 3.8209153905781825, 'subsample': 0.4760189439492335, 'colsample_bytree': 0.7098440970336479, 'gamma': 0.209688895363623, 'min_child_weight': 1}. Best is trial 89 with value: 269.2243781486146.
[I 2023-08-24 21:39:07,066] Trial 90 finished with value: 277.6214528888539 and parameters: {'n_estimators': 273, 'learning_rate': 0.29316623131900954, 'max_depth': 9, 'reg_lambda': 1.8704233342146532, 'subsample': 0.5162545925081636, 'colsample_bytree': 0.6659807312716542, 'gamma': 0.21314061070936458, 'min_child_weight': 3}. Best is trial 89 with value: 269.2243781486146.
[I 2023-08-24 21:39:07,445] Trial 91 finished with value: 274.58950060020464 and parameters: {'n_estimators': 313, 'learning_rate': 0.25057031221364373, 'max_depth': 9, 'reg_lambda': 4.929872027258234, 'subsample': 0.4838363148816927, 'colsample_bytree': 0.7063505962722701, 'gamma': 0.12587747764812254, 'min_child_weight': 1}. Best is trial 89 with value: 269.2243781486146.
[I 2023-08-24 21:39:07,897] Trial 92 finished with value: 279.46095594891375 and parameters: {'n_estimators': 315, 'learning_rate': 0.2550774152499502, 'max_depth': 9, 'reg_lambda': 1.0006885247246193, 'subsample': 0.4751517688643813, 'colsample_bytree': 0.7107848457883194, 'gamma': 0.11938031091527934, 'min_child_weight': 1}. Best is trial 89 with value: 269.2243781486146.
[I 2023-08-24 21:39:08,257] Trial 93 finished with value: 268.49106580604536 and parameters: {'n_estimators': 388, 'learning_rate': 0.27591688091279815, 'max_depth': 8, 'reg_lambda': 4.114256239834311, 'subsample': 0.4488691100208403, 'colsample_bytree': 0.6797203555585086, 'gamma': 0.05485185212903983, 'min_child_weight': 1}. Best is trial 93 with value: 268.49106580604536.
[I 2023-08-24 21:39:08,611] Trial 94 finished with value: 271.80698869450566 and parameters: {'n_estimators': 351, 'learning_rate': 0.28262009546200967, 'max_depth': 8, 'reg_lambda': 0.3378461141864501, 'subsample': 0.4487850790428043, 'colsample_bytree': 0.6852330136206313, 'gamma': 0.05550074982972157, 'min_child_weight': 1}. Best is trial 93 with value: 268.49106580604536.
[I 2023-08-24 21:39:08,938] Trial 95 finished with value: 271.37443546323993 and parameters: {'n_estimators': 405, 'learning_rate': 0.27913303701950304, 'max_depth': 8, 'reg_lambda': 1.5366493622447066, 'subsample': 0.4486042623748286, 'colsample_bytree': 0.676803279951354, 'gamma': 0.054702701999234764, 'min_child_weight': 2}. Best is trial 93 with value: 268.49106580604536.
[I 2023-08-24 21:39:09,249] Trial 96 finished with value: 279.8647793017947 and parameters: {'n_estimators': 398, 'learning_rate': 0.2789584395705792, 'max_depth': 7, 'reg_lambda': 0.3492011644777729, 'subsample': 0.39402751076835313, 'colsample_bytree': 0.6795071210550525, 'gamma': 0.05233425116768309, 'min_child_weight': 2}. Best is trial 93 with value: 268.49106580604536.
[I 2023-08-24 21:39:09,577] Trial 97 finished with value: 271.273272689704 and parameters: {'n_estimators': 419, 'learning_rate': 0.22936684290306802, 'max_depth': 8, 'reg_lambda': 0.8360523902986643, 'subsample': 0.4141749828009126, 'colsample_bytree': 0.6572846232591825, 'gamma': 0.004692807666286096, 'min_child_weight': 1}. Best is trial 93 with value: 268.49106580604536.
[I 2023-08-24 21:39:09,898] Trial 98 finished with value: 275.94337536405857 and parameters: {'n_estimators': 425, 'learning_rate': 0.297868963638918, 'max_depth': 8, 'reg_lambda': 0.3704185670629215, 'subsample': 0.42299890292783365, 'colsample_bytree': 0.6599775577746472, 'gamma': 0.00505308964269518, 'min_child_weight': 1}. Best is trial 93 with value: 268.49106580604536.
[I 2023-08-24 21:39:10,201] Trial 99 finished with value: 280.4633100303054 and parameters: {'n_estimators': 491, 'learning_rate': 0.27784237696610226, 'max_depth': 8, 'reg_lambda': 2.9988365404132584, 'subsample': 0.35183934580358445, 'colsample_bytree': 0.5978341994723186, 'gamma': 0.026413181848724987, 'min_child_weight': 1}. Best is trial 93 with value: 268.49106580604536.
Best Hyperparameters: {'n_estimators': 388, 'learning_rate': 0.27591688091279815, 'max_depth': 8, 'reg_lambda': 4.114256239834311, 'subsample': 0.4488691100208403, 'colsample_bytree': 0.6797203555585086, 'gamma': 0.05485185212903983, 'min_child_weight': 1}
Best Score (MAE): 268.49106580604536
🤖 Pulling it all together
¶objCol = ['TruckSID',
'Engine',
'Transmission',
'FrontAxlePosition',
'FrameRails',
'Liner',
'FrontEndExt',
'Cab',
'RearAxels',
'RearSusp',
'FrontSusp',
'RearWheels',
'RearTires',
'FrontWheels',
'FrontTires',
'TagAxle',
'EngineFamily',
'TransmissionFamily']
class DropTargets(BaseEstimator, TransformerMixin):
def __init__(self, targets = ['ActualWeightBack','ActualWeightFront','ActualWeightTotal']):
self.targets = targets
def fit(self, X, y=None):
return self
def transform(self, X):
try:
X = X.drop(self.targets, axis = 1)
except:
pass
return X
class convertCatColumnsToString(BaseEstimator, TransformerMixin):
def __init__(self, obj_Col = objCol):
self.obj_Col = obj_Col
def fit(self, X, y=None):
return self
def transform(self, X):
X[self.obj_Col] = X[self.obj_Col].astype('string')
return X
class replaceSpace(BaseEstimator, TransformerMixin):
def fit(self, X, y=None):
return self
def transform(self, X):
all_cols = X.columns
for col in all_cols:
try:
X[col] = X[col].str.replace(' ', '')
except:
pass
return X
class replaceDot(BaseEstimator, TransformerMixin):
def __init__(self, obj_Col = ['EngineFamily']):
self.cols = obj_Col
def fit(self, X, y=None):
return self
def transform(self, X):
for col in self.cols:
X[col] = X[col].str.replace('.', '')
return X
# Step 1: Custom transformer to convert string columns to category
class StringToCategory(BaseEstimator, TransformerMixin):
def fit(self, X, y=None):
return self
def transform(self, X):
string_cols = X.select_dtypes(include=['string']).columns
X[string_cols] = X[string_cols].astype('category')
return X
class CheckUpperAndLowerBound(BaseEstimator, TransformerMixin):
def __init__(self, upper_bound = 344, lower_bound = -164,variable = 'Overhang',replacement_value = 90):
self.upper_bound = upper_bound
self.lower_bound = lower_bound
self.variable = variable
self.replacement_value = replacement_value
def fit(self, X, y=None):
return self
def transform(self, X):
X.loc[X[self.variable] < self.lower_bound, self.variable] = self.replacement_value
X.loc[X[self.variable] > self.upper_bound , self.variable] = self.replacement_value
return X
class DataFrameScaler(BaseEstimator, TransformerMixin):
def __init__(self):
self.scaler = StandardScaler()
self.columns = None
def fit(self, X, y=None):
self.scaler.fit(X, y)
self.columns = X.columns
return self
def transform(self, X):
X_scaled = self.scaler.transform(X)
return pd.DataFrame(X_scaled, columns=self.columns)
class FeatureEngineer(BaseEstimator, TransformerMixin):
def __init__(self):
self.columns = selected_features
def fit(self, X, y=None):
return self
def transform(self, X):
X['Engine_Transmission'] = X['Engine'] * X['Transmission']
X['TransmissionFamily_EngineFamily'] = X['TransmissionFamily'] * X['EngineFamily']
# Polynomial Features for numeric variables:
X['WheelBase_squared'] = X['WheelBase'] ** 2
X['Overhang_squared'] = X['Overhang'] ** 2
# Ratio Features:
X['Front_to_Rear_Wheels'] = X['FrontWheels'] / (X['RearWheels'] + 0.001) # Add a small number to avoid division by zero
X['WheelBase_to_Overhang'] = X['WheelBase'] / (X['Overhang'] + 0.001)
# Aggregated Features for TransmissionFamily and EngineFamily:
X['avg_WheelBase_per_TransmissionFamily'] = X.groupby('TransmissionFamily')['WheelBase'].transform('mean')
X['avg_Overhang_per_EngineFamily'] = X.groupby('EngineFamily')['Overhang'].transform('mean')
# Features based on other columns:
X['sum_WheelBase_per_Engine'] = X.groupby('Engine')['WheelBase'].transform('sum')
X['max_Overhang_per_Transmission'] = X.groupby('Transmission')['Overhang'].transform('max')
X = X[self.columns]
return X
from numpy import log1p
class LogTransform(BaseEstimator, TransformerMixin):
def __init__(self, columns=None):
self.columns = columns
def fit(self, X, y=None):
return self
def transform(self, X):
# If specific columns are provided
if self.columns:
for col in self.columns:
X[col] = log1p(X[col]) # Using log1p to also handle 0 values
else:
# If no specific columns are provided, apply to the whole DataFrame
X = log1p(X)
return X
class DropIDColumn(BaseEstimator, TransformerMixin):
def __init__(self, columns=None):
self.columns = columns
def fit(self, X, y=None):
return self
def transform(self, X):
X = X.drop(self.columns, axis = 1)
return X
training = pd.read_csv('training.csv',delimiter=';')
target_ActualWeightFront = training['ActualWeightFront']
target_ActualWeightTotal = training['ActualWeightTotal']
training
| TruckSID | ActualWeightFront | ActualWeightBack | ActualWeightTotal | Engine | Transmission | FrontAxlePosition | WheelBase | Overhang | FrameRails | Liner | FrontEndExt | Cab | RearAxels | RearSusp | FrontSusp | RearWheels | RearTires | FrontWheels | FrontTires | TagAxle | EngineFamily | TransmissionFamily | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 31081 | 11280 | 8030 | 19310 | 1012011 | 2700028 | 3690005 | 249 | 104 | 403012 | 404002 | 4070004 | 5000002 | 330444 | 3500004 | 3700002 | 9140014 | 933469 | 9050015 | 930469 | 3P1998 | 101D100 | 270C25 |
| 1 | 30580 | 10720 | 6660 | 17380 | 1012011 | 2700022 | 3690005 | 183 | 68 | 403012 | 404002 | 4070004 | 5000004 | 330507 | 3500004 | 3700011 | 9142001 | 933469 | 9050031 | 930821 | 3P1998 | 101D100 | 270C24 |
| 2 | 31518 | 11040 | 6230 | 17270 | 1012001 | 2700022 | 3690005 | 216 | 68 | 403012 | 404002 | 4070004 | 5000001 | 330444 | 3500004 | 3700002 | 9140014 | 933062 | 9050015 | 930469 | 3P1998 | 101D97 | 270C24 |
| 3 | 31816 | 11210 | 7430 | 18640 | 1012002 | 2700028 | 3690005 | 219 | 104 | 403012 | 404002 | 4070004 | 5000002 | 330444 | 3500004 | 3700002 | 9140014 | 933062 | 9050015 | 930469 | 3P1998 | 101D97 | 270C25 |
| 4 | 30799 | 11910 | 7510 | 19420 | 1012019 | 2700028 | 3690005 | 231 | 104 | 403012 | 404002 | 4070004 | 5000001 | 330444 | 3500004 | 3700011 | 9142001 | 933469 | 9050037 | 930469 | 3P1998 | 101D102 | 270C25 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 2639 | 34891 | 10110 | 9830 | 19940 | 1012012 | 2700024 | 3690005 | 210 | 104 | 403012 | 404998 | 4070004 | 5000002 | 3300041 | 3500003 | 3700002 | 9140016 | 933469 | 9050015 | 930469 | 3P1998 | 101D100 | 270C24 |
| 2640 | 25021 | 11150 | 6700 | 17850 | 1012002 | 2700028 | 3690005 | 210 | 74 | 403012 | 404002 | 4070004 | 5000003 | 330444 | 3500004 | 3700002 | 9140014 | 933062 | 9050015 | 930469 | 3P1998 | 101D97 | 270C25 |
| 2641 | 33141 | 10850 | 7020 | 17870 | 1012002 | 2700028 | 3690005 | 222 | 80 | 403012 | 404998 | 4070004 | 5000002 | 330444 | 3500014 | 3700002 | 9140014 | 933062 | 9050015 | 930469 | 3P1998 | 101D97 | 270C25 |
| 2642 | 40311 | 10380 | 6850 | 17230 | 1012011 | 2700022 | 3690005 | 222 | 56 | 403012 | 404002 | 4070004 | 5000001 | 330444 | 3500004 | 3700002 | 9142003 | 933469 | 9052003 | 930469 | 3P1998 | 101D100 | 270C24 |
| 2643 | 33401 | 9820 | 8760 | 18580 | 1012011 | 2700022 | 3690005 | 198 | 104 | 403012 | 404998 | 4070004 | 5000002 | 330507 | 3500004 | 3700002 | 9142001 | 933469 | 9050037 | 930469 | 3P1998 | 101D100 | 270C24 |
2644 rows × 23 columns
Building the Front Model
¶prep_fe_pipeline_front = Pipeline([
('drop_Targets', DropTargets()),
('convert_cat_to_string', convertCatColumnsToString()),
('replace_dot', replaceDot()),
('replace_space', replaceSpace()),
('str_to_cat', StringToCategory()),
('upper_lower', CheckUpperAndLowerBound()),
('target_encode', ce.TargetEncoder(handle_unknown='value', handle_missing='value')),
('scale', DataFrameScaler()),
('FE',FeatureEngineer()),
])
formattedData = prep_fe_pipeline_front.fit_transform(training,target_ActualWeightFront)
formattedData
| Engine | Transmission | FrontAxlePosition | WheelBase | Overhang | FrameRails | Liner | FrontEndExt | Cab | RearAxels | RearSusp | FrontSusp | RearWheels | RearTires | FrontWheels | FrontTires | TagAxle | EngineFamily | TransmissionFamily | Engine_Transmission | TransmissionFamily_EngineFamily | WheelBase_squared | Overhang_squared | Front_to_Rear_Wheels | WheelBase_to_Overhang | avg_Overhang_per_EngineFamily | sum_WheelBase_per_Engine | max_Overhang_per_Transmission | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | -0.775150 | 1.199613 | 0.085077 | 2.620206 | 0.867729 | 0.647233 | 1.118745 | -0.061616 | -0.946623 | 0.928864 | 0.251651 | 0.402826 | 1.262227 | -0.240839 | 1.224191 | -0.359090 | -0.15539 | -0.774318 | 1.059240 | -0.929880 | -0.820189 | 6.865481 | 0.752953 | 0.969098 | 3.016139 | -0.089933 | -44.907440 | 2.373981 |
| 1 | -0.775150 | -0.917003 | 0.085077 | -1.545882 | -1.391651 | 0.647233 | 1.118745 | -0.061616 | 1.193472 | -0.427545 | 0.251651 | -0.687423 | -0.317981 | -0.240839 | 0.801290 | 1.816748 | -0.15539 | -0.774318 | -0.944073 | 0.710815 | 0.731013 | 2.389750 | 1.936692 | -2.527878 | 1.111625 | -0.089933 | -44.907440 | 1.244292 |
| 2 | 0.194674 | -0.917003 | 0.085077 | 0.537162 | -1.391651 | 0.647233 | 1.118745 | -0.061616 | -0.408809 | 0.928864 | 0.251651 | 0.402826 | 1.262227 | 0.944712 | 1.224191 | -0.359090 | -0.15539 | 0.283715 | -0.944073 | -0.178516 | -0.267848 | 0.288543 | 1.936692 | 0.969098 | -0.386267 | -0.073598 | 16.114870 | 1.244292 |
| 3 | 0.270655 | 1.199613 | 0.085077 | 0.726530 | 0.867729 | 0.647233 | 1.118745 | -0.061616 | -0.946623 | 0.928864 | 0.251651 | 0.402826 | 1.262227 | 0.944712 | 1.224191 | -0.359090 | -0.15539 | 0.283715 | 1.059240 | 0.324682 | 0.300522 | 0.527846 | 0.752953 | 0.969098 | 0.836314 | -0.073598 | 130.089260 | 2.373981 |
| 4 | 1.580971 | 1.199613 | 0.085077 | 1.484000 | 0.867729 | 0.647233 | 1.118745 | -0.061616 | -0.408809 | 0.928864 | 0.251651 | -0.687423 | -0.317981 | -0.240839 | -0.856580 | -0.359090 | -0.15539 | 1.551007 | 1.059240 | 1.896553 | 1.642889 | 2.202257 | 0.752953 | 2.702304 | 1.708244 | 0.409488 | -7.553028 | 2.373981 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 2639 | -0.175652 | -0.231394 | 0.085077 | 0.158427 | 0.867729 | 0.647233 | -0.906376 | -0.061616 | -0.946623 | -1.712860 | -3.511279 | 0.402826 | 1.691276 | -0.240839 | 1.224191 | -0.359090 | -0.15539 | -0.774318 | -0.944073 | 0.040645 | 0.731013 | 0.025099 | 0.752953 | 0.723399 | 0.182367 | -0.089933 | 0.633708 | 0.867729 |
| 2640 | 0.270655 | 1.199613 | 0.085077 | 0.158427 | -1.015088 | 0.647233 | 1.118745 | -0.061616 | 2.089485 | 0.928864 | 0.251651 | 0.402826 | 1.262227 | 0.944712 | 1.224191 | -0.359090 | -0.15539 | 0.283715 | 1.059240 | 0.324682 | 0.300522 | 0.025099 | 1.030403 | 0.969098 | -0.156226 | -0.073598 | 130.089260 | 2.373981 |
| 2641 | 0.270655 | 1.199613 | 0.085077 | 0.915898 | -0.638524 | 0.647233 | -0.906376 | -0.061616 | -0.946623 | 0.928864 | 0.517296 | 0.402826 | 1.262227 | 0.944712 | 1.224191 | -0.359090 | -0.15539 | 0.283715 | 1.059240 | 0.324682 | 0.300522 | 0.838868 | 0.407713 | 0.969098 | -1.436647 | -0.073598 | 130.089260 | 2.373981 |
| 2642 | -0.775150 | -0.917003 | 0.085077 | 0.915898 | -2.144777 | 0.647233 | 1.118745 | -0.061616 | -0.408809 | 0.928864 | 0.251651 | 0.402826 | -0.721544 | -0.240839 | -1.037440 | -0.359090 | -0.15539 | -0.774318 | -0.944073 | 0.710815 | 0.731013 | 0.838868 | 4.600069 | 1.439802 | -0.427235 | -0.089933 | -44.907440 | 1.244292 |
| 2643 | -0.775150 | -0.917003 | 0.085077 | -0.599043 | 0.867729 | 0.647233 | -0.906376 | -0.061616 | -0.946623 | -0.427545 | 0.251651 | 0.402826 | -0.317981 | -0.240839 | -0.856580 | -0.359090 | -0.15539 | -0.774318 | -0.944073 | 0.710815 | 0.731013 | 0.358853 | 0.752953 | 2.702304 | -0.689563 | -0.089933 | -44.907440 | 1.244292 |
2644 rows × 28 columns
X_train, X_test, y_train, y_test = train_test_split(formattedData, target_ActualWeightFront, test_size=0.3, random_state=42)
#{'n_estimators': 495, 'learning_rate': 0.133354970082411, 'max_depth': 7, 'reg_lambda': 4.53385562551496e-08, 'subsample': 0.8141582328181602, 'colsample_bytree': 0.7732232759096854, 'gamma': 0.34364076800046645, 'min_child_weight': 1}
model = XGBRegressor(n_estimators=495, learning_rate=0.133354970082411, max_depth=7, reg_lambda=4.53385562551496e-08, subsample=0.8141582328181602, colsample_bytree=0.7732232759096854, gamma=0.34364076800046645, min_child_weight=1, random_state=42)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
mae = mean_absolute_error(y_test, y_pred)
print("MAE: ", mae)
MAE: 128.6261733509131
def plotReliabilityGraphs(y_test,y_pred,plot_name):
residuals = y_test - y_pred
plt.scatter(y_pred, residuals)
plt.axhline(0, color='red', linestyle='--')
plt.xlabel('Predicted')
plt.ylabel('Residuals')
plt.title(f'Residual Plot for {plot_name}')
plt.show()
plt.scatter(y_test, y_pred)
plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 'k--', lw=4)
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.title(f'Actual vs. Predicted for {plot_name}')
plt.show()
plt.hist(residuals, bins=20)
plt.xlabel('Residuals')
plt.ylabel('Frequency')
plt.title(f'Histogram of Residuals for {plot_name}')
plt.show()
stats.probplot(residuals, plot=plt)
plt.show()
plotReliabilityGraphs(y_test,y_pred,'XGBoost without bagging for Front')
xgb_model = XGBRegressor(n_estimators=495, learning_rate=0.133354970082411, max_depth=7, reg_lambda=4.53385562551496e-08, subsample=0.8141582328181602, colsample_bytree=0.7732232759096854, gamma=0.34364076800046645, min_child_weight=1, random_state=42)
# Wrapping the model within a BaggingRegressor
bagging_model_front = BaggingRegressor(base_estimator=xgb_model, n_estimators=3, random_state=0)
bagging_model_front.fit(X_train, y_train)
y_pred = bagging_model_front.predict(X_test)
print("MAE: ", mean_absolute_error(y_test, y_pred))
plotReliabilityGraphs(y_test,y_pred, 'XGBoost with Bagging Regressor for Front')
MAE: 132.40994224260075
Building the Total Model
¶prep_fe_pipeline_total = Pipeline([
('drop_Targets', DropTargets()),
('convert_cat_to_string', convertCatColumnsToString()),
('replace_dot', replaceDot()),
('replace_space', replaceSpace()),
('str_to_cat', StringToCategory()),
('upper_lower', CheckUpperAndLowerBound()),
('target_encode', ce.TargetEncoder(handle_unknown='value', handle_missing='value')),
('scale', DataFrameScaler()),
('DropIdColumn', DropIDColumn(['TruckSID'])),
])
formattedData = prep_fe_pipeline_total.fit_transform(training,target_ActualWeightTotal)
X_train, X_test, y_train, y_test = train_test_split(formattedData, target_ActualWeightTotal, test_size=0.3, random_state=42)
X_train
| Engine | Transmission | FrontAxlePosition | WheelBase | Overhang | FrameRails | Liner | FrontEndExt | Cab | RearAxels | RearSusp | FrontSusp | RearWheels | RearTires | FrontWheels | FrontTires | TagAxle | EngineFamily | TransmissionFamily | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1385 | -0.680991 | -0.776702 | 0.085077 | -0.788411 | 0.867729 | -1.325093 | -0.890816 | -0.061616 | 0.923031 | 0.596275 | 0.26966 | 0.690944 | 0.357058 | 0.521415 | 0.195839 | 0.194185 | -0.117407 | -0.685647 | -0.944073 |
| 297 | -0.556468 | -0.776702 | 0.085077 | 0.537162 | -1.391651 | 0.636769 | 1.128937 | -0.061616 | 0.923031 | 0.596275 | 0.26966 | 0.690944 | 0.326938 | -0.675918 | 0.985025 | 0.194185 | -0.117407 | -0.072367 | -0.944073 |
| 598 | -0.680991 | -0.776702 | 0.085077 | 0.158427 | -1.015088 | -1.325093 | 1.128937 | -0.061616 | -1.345265 | 0.308506 | 0.26966 | 0.690944 | 0.357058 | 0.521415 | 0.452175 | 0.538199 | -0.117407 | -0.685647 | -0.944073 |
| 1644 | -0.145792 | 1.184255 | 0.085077 | 2.809574 | -0.638524 | 0.636769 | 1.128937 | -0.061616 | 0.691917 | 0.596275 | 0.26966 | 0.690944 | 1.520929 | 0.521415 | 0.985025 | 0.194185 | -0.117407 | -0.072367 | 1.059240 |
| 751 | -0.145792 | 1.184255 | 0.085077 | -1.356514 | 0.867729 | 0.636769 | 1.128937 | -0.061616 | -0.536155 | 0.596275 | 0.26966 | -1.450352 | 0.326938 | -0.675918 | 0.985025 | 0.194185 | -0.117407 | -0.072367 | 1.059240 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 1638 | -0.680991 | -0.776702 | 0.085077 | -0.030941 | -0.261961 | 0.636769 | 1.128937 | -0.061616 | 0.923031 | 0.308506 | 0.26966 | 0.690944 | -2.236636 | -3.161144 | -2.556963 | -3.647514 | -0.117407 | -0.685647 | -0.944073 |
| 1095 | -0.680991 | -0.345998 | 0.085077 | 0.158427 | -0.261961 | 0.636769 | -0.890816 | -0.061616 | -1.345265 | -2.609984 | 0.26966 | 0.690944 | -0.667824 | 0.521415 | -1.016930 | 0.194185 | -0.117407 | -0.685647 | 1.059240 |
| 1130 | -0.680991 | 1.184255 | 0.085077 | 0.726530 | 0.867729 | 0.636769 | -0.890816 | -0.061616 | -1.345265 | 0.596275 | 0.26966 | 0.690944 | -0.967372 | -0.675918 | -1.016930 | 0.194185 | -0.117407 | -0.685647 | 1.059240 |
| 1294 | 1.928706 | 1.184255 | 0.085077 | -0.220308 | 0.867729 | 0.636769 | -0.890816 | -0.061616 | 0.923031 | 0.596275 | 0.26966 | 0.690944 | 0.357058 | 0.521415 | 0.452175 | 0.538199 | -0.117407 | 1.886329 | 1.059240 |
| 860 | -0.145792 | 1.184255 | 0.085077 | 2.809574 | -0.638524 | 0.636769 | 1.128937 | -0.061616 | -0.536155 | 0.596275 | 0.26966 | -1.450352 | 1.520929 | 0.521415 | 0.985025 | 0.194185 | -0.117407 | -0.072367 | 1.059240 |
1850 rows × 19 columns
#{'n_estimators': 993, 'learning_rate': 0.24702617076976288, 'max_depth': 8, 'reg_lambda': 1.3184725672621053e-09, 'subsample': 0.7745537544177631, 'colsample_bytree': 0.5084493201928549, 'gamma': 0.9975234538045155, 'min_child_weight': 9}
xgb_model = XGBRegressor(n_estimators=993, learning_rate=0.24702617076976288, max_depth=8, reg_lambda=1.3184725672621053e-09, subsample=0.7745537544177631, colsample_bytree=0.5084493201928549, gamma=0.9975234538045155, min_child_weight=9,random_state=42)
# Wrapping the model within a BaggingRegressor
bagging_model_total = BaggingRegressor(base_estimator=xgb_model, n_estimators=10)
bagging_model_total.fit(X_train, y_train)
y_pred = bagging_model_total.predict(X_test)
print('MAE:', mean_absolute_error(y_test, y_pred))
plotReliabilityGraphs(y_test,y_pred, 'XGBoost with Bagging Regressor for Total')
MAE: 280.8806896449937
📝 Predicting The Test Data
¶testData = pd.read_csv('testing.csv',delimiter=';')
test_data_for_total = prep_fe_pipeline_total.transform(testData)
test_data_for_front = prep_fe_pipeline_front.transform(testData)
total_pred = bagging_model_total.predict(test_data_for_total).astype(int)
front_pred = bagging_model_front.predict(test_data_for_front).astype(int)
back_pred = (total_pred - front_pred).astype(int)
finalDf = pd.DataFrame({'TruckSID':testData['TruckSID'],
'PredictedWeightFront':front_pred.flatten(),
'PredictedWeightBack':back_pred.flatten(),
'PredictedWeightTotal':total_pred.flatten()})
finalDf.to_csv('finalProduct2.csv', index=False)
pred = bagging_model_front.predict(test_data_for_front)
pred
array([10919.966 , 12096.512 , 10939.284 , 10506.544 , 11287.884 ,
10595.5205, 10220.859 , 10477.777 , 12141.856 , 11448.593 ,
10399.335 , 10952.543 , 10437.177 , 11210.692 , 11105.292 ,
12421.809 , 10791.581 , 10299.2295, 10771.863 , 10978.629 ,
10405.955 , 10120.337 , 10497.404 , 10458.731 , 10328.438 ,
10890.548 , 11443.997 , 10590.777 , 9879.163 , 11238.274 ,
11500.628 , 11062.325 , 11210.958 , 10892.074 , 11195.583 ,
11416.176 , 10432.015 , 10686.352 , 10258.469 , 12125.937 ,
9755.555 , 10345.53 , 10659.266 , 10118.015 , 11042.6455,
11104.478 , 12502.105 , 10090.04 , 12294.133 , 10257.591 ,
9685.571 , 12060.245 , 11984.082 , 10441.053 , 10975.647 ,
11283.767 , 10399.78 , 11319.277 , 10353.947 , 10884.288 ,
9793.021 , 10812.937 , 10474.135 , 10733.358 , 11154.466 ,
10251.309 , 11073.312 , 11319.277 , 10541.645 , 12487.707 ,
9871.865 , 10528.238 , 10587.112 , 10803.531 , 10679.147 ,
10249.28 , 9638.108 , 11986.81 , 10946.532 , 11215.629 ,
10943.289 , 10054.1455, 12110.292 , 12604.263 , 10950.159 ,
12242.367 , 12539.137 , 10298.261 , 10553.04 , 11418.55 ,
9663.673 , 9863.523 , 10494.723 , 11211.98 , 10089.114 ,
11275.733 , 10922.423 , 10712.058 , 11950.847 , 10182.765 ,
10409.296 , 12146.058 , 12217.683 , 11013.109 , 11320.855 ,
10229.929 , 10440.531 , 12448.56 , 11109.714 , 11375.681 ,
11615.446 , 11209.7 , 10577.309 , 10113.798 , 10442.873 ,
12489.892 , 11611.749 , 10500.301 , 10522.988 , 10706.741 ,
12545.0205, 10215.619 , 11119.649 , 10428.082 , 10362.093 ,
10645.121 , 10535.516 , 10974.109 , 12127.628 , 10840.29 ,
10895.418 , 12213.531 , 11933.724 , 10726.531 , 11233.477 ,
10805.665 , 10932.632 , 10440.915 , 11873.37 , 12364.22 ,
10298.261 , 10347.611 , 10311.133 , 11188.011 , 11953.847 ,
11194.606 , 10671.03 , 12331.706 , 11805.839 , 12181.5205,
10435.081 , 10592.04 , 10910.126 , 9950.243 , 10828.354 ,
9984.835 , 10923.172 , 10905.212 , 10414.692 , 11241.73 ,
10236.226 , 10502.255 , 10712.278 , 11320.047 , 10928.023 ,
11068.253 , 11258.628 , 10942.898 , 11227.575 , 10922.894 ,
11518.956 , 11704.18 , 10600.339 , 10816.6875, 10527.824 ,
10600.856 , 11447.004 , 10762.624 , 11057.544 , 11285.917 ,
11213.368 , 10565.848 , 10901.038 , 10809.657 , 11371.082 ,
10580.542 , 10228.4795, 10331.227 , 10476.05 , 10892.074 ,
10291.181 , 10782.22 , 11014.968 , 11629.813 , 11107.984 ,
10723.262 , 10009.741 , 11117.778 , 11186.168 , 10380.872 ,
11450.66 , 10313.824 , 10498.339 , 10348.397 , 11094.718 ,
10387.672 , 10158.429 , 9852.298 , 10871.276 , 11212.719 ,
11339.108 , 10512.33 , 11446.0625, 12448.586 , 10378.921 ,
10446.622 , 10209.477 , 9918.968 , 10782.991 , 10241.102 ,
10139.665 , 11381.718 , 11371.847 , 12090.797 , 11392.059 ,
10298.512 , 10574.2 , 10864.028 , 10798.855 , 10650.13 ,
11314.433 , 10383.981 , 10992.247 , 11086.757 , 10309.046 ,
11155.608 , 10336.306 , 11241.73 , 10457.317 , 9934.751 ,
10542.458 , 10970.79 , 10406.591 , 11117.19 , 10726.996 ,
10842.741 , 11021.902 , 10711.824 , 11575.61 , 10130.535 ,
10409.489 , 11140.508 , 10753.207 , 11326.642 , 12007.185 ,
11519.409 , 10925.5205, 12673.227 , 11194.009 , 12001.875 ,
11935.579 , 10445.171 , 12440.55 , 10652.847 , 11904.636 ,
11028.758 , 9766.553 , 9775.177 , 11951.427 , 9712.925 ,
10406.255 , 10347.946 , 10744.247 , 11234.356 , 10858.819 ,
10855.664 , 9683.858 , 11018.55 , 11844.402 , 10237.765 ,
11062.593 , 11238.733 , 10808.125 , 10509.999 , 11254.702 ,
11094.718 , 11991.761 , 11176.398 , 9532.988 , 10670.311 ,
10838.297 , 10321.95 , 12571.164 , 10174.665 , 11234.542 ,
11201.497 , 10840.671 , 10477.527 , 10479.625 , 11171.995 ,
10316.495 , 10520.117 , 10162.434 , 10527.025 , 11821.87 ,
10201.593 , 11948.042 , 9777.745 , 11234.147 , 10167.571 ,
11247.284 , 11292.335 , 11319.2295, 10118.669 , 10809.776 ,
9569.882 , 11105.292 , 10571.884 , 9731.54 , 10847.536 ,
10484.281 , 10280.562 , 11964.133 , 10431.091 , 10668.087 ,
11030.599 , 11197.617 , 10069.817 , 12453.798 , 10493.122 ,
10407.923 , 11240.601 , 10650.065 , 11719.039 , 11722.56 ,
11135.184 , 10899.083 , 10105.503 , 10636.351 , 9620.647 ,
11832.202 , 10385.669 , 10819.923 , 11911.664 , 12182.831 ,
10332.53 , 10848.527 , 10072.395 , 9990.332 , 9753.204 ,
10182.733 , 9824.64 , 11545.544 , 10942.941 , 9811.895 ,
11971.077 , 11019.892 , 10638.319 , 11146.005 , 12374.159 ,
10201.402 , 10479.762 , 11205.86 , 10362.062 , 10532.257 ,
10139.105 , 10519.307 , 11021.902 , 10997.742 , 9831.267 ,
10215.674 , 11321.489 , 11107.711 , 9772.89 , 10623.058 ,
12023.945 , 10987.46 , 11042.6455, 9681.184 , 10189.015 ,
10760.724 , 11616.148 , 10583.976 , 10699.116 , 11532.472 ,
11795.669 , 10604.569 , 10442.695 , 10337.302 , 10445.481 ,
10798.185 , 10416.203 , 11002.853 , 11070.677 , 10376.858 ,
10455.386 , 9827.785 , 11829.641 , 11952.667 , 10596.542 ,
10357.839 , 10906.133 , 10930.882 , 10830.777 , 10582.032 ,
11068.253 , 10639.761 , 10430.18 , 9986.272 , 12345.976 ,
10260.767 , 10028.018 , 10890.471 , 10443.397 , 11241.73 ,
11000.927 , 9726.772 , 10478.368 , 10325.742 , 9734.253 ,
12277.368 , 11228.052 , 10859.292 , 10456.904 , 11120.489 ,
10265.901 , 11022.301 , 10507.554 , 10870.124 , 10845.883 ,
11078.046 , 11883.828 , 10628.03 , 10530.761 , 10140.081 ,
10664.711 , 10202.171 , 9957.669 , 12463.61 , 12266.022 ,
12100.003 , 10781.351 , 10212.599 , 10763.234 , 11501.456 ,
12076.026 , 11861.535 , 11927.992 , 10635.94 , 10420.773 ,
9851.723 , 11190.356 , 10253.171 , 10436.681 , 10803.178 ,
10491.449 , 10926.934 , 11071.071 , 10467.522 , 9662.821 ,
11088.07 , 10380.595 , 11145.278 , 11214.013 , 11012.378 ,
10785.871 , 9470.16 , 9918.968 , 10197.958 , 10448.28 ,
12292.165 , 10130.377 , 12054.294 , 9991.591 , 10776.443 ,
10250.77 , 10401.868 , 10859.571 , 11191.157 , 10135.177 ,
10505.474 , 10565.773 , 9853.741 , 10170.591 , 10634.277 ,
11844.589 , 10579.923 , 10563.228 , 10339.052 , 10593.448 ,
11511.59 , 9761.8125, 10285.814 , 11798.851 , 10457.605 ,
9586.562 , 10394.813 , 11052.87 , 11048.843 , 10427.5205,
11446.0625, 10565.501 , 9683.114 , 12456.698 , 12412.724 ,
11331.583 , 10603.062 , 10409.11 , 10226.426 , 10190.03 ,
10253.73 , 10830.573 , 9768.13 , 10246.091 , 11244.481 ,
9883.207 , 11124.95 , 10382.316 , 11035.515 , 11140.66 ,
11193.458 , 10793.0205, 12220.059 , 10590.89 , 10760.504 ,
11075.352 , 12440.55 , 11933.378 , 10632.737 , 10409.406 ,
10673.077 , 10393.237 , 10543.015 , 10523.575 , 10997.517 ,
10428.228 , 10752.324 , 12170.819 , 10595.883 , 10681.974 ,
9771.753 , 11171.995 , 10758.4795, 9755.555 , 10375.554 ,
11245.954 , 11053.892 , 11238.274 , 11464.532 , 10865.571 ,
10989.3125, 10386.331 , 11169.336 , 10334.659 , 11840.231 ,
11229.935 , 11106.302 , 11153.632 , 10361.114 , 10467.274 ,
10125.533 , 10545.473 , 11216.897 , 10736.565 , 11477.685 ,
10145.339 , 10612.435 , 10723.931 , 11127.897 , 10346.552 ,
11029.719 , 10208.599 , 11256.382 , 10348.397 , 10075.114 ,
10270.925 , 10699.655 , 11265.269 , 10236.844 , 11916.2 ,
9792.417 , 10452.243 , 11233.3545, 11245.109 , 10346.577 ,
10416.522 , 12428.445 , 10533.233 , 10135.177 , 11073.809 ,
10313.824 , 10611.409 , 11361.7295, 11093.6455, 12470.4375,
9684.1045, 12004.034 , 11106.302 , 10697.472 , 10557.765 ,
10799.005 , 11797.204 , 10839.129 , 11759.844 , 11403.724 ,
10484.587 , 9839.794 , 10929.0205, 11028.879 , 9869.619 ,
11892.719 , 10221.862 , 10307.069 , 11599.585 , 10815.79 ,
11235.246 , 10389.739 , 11462.776 , 10568.532 , 10600.635 ,
10915.706 , 12092.575 , 10913.819 , 11153.039 , 10356.06 ,
10291.077 , 12130.426 , 10218.161 , 10969.405 , 11074.315 ,
11031.575 , 9496.825 , 9863.523 , 10954.483 , 10430.741 ,
10785.808 , 11256.152 , 11546.942 , 12113.148 , 10177.84 ,
11251.78 , 10793.695 , 10433.417 , 12539.137 , 12005.831 ,
9851.672 , 11472.621 , 10451.226 , 11211.019 , 11948.147 ,
10960.692 , 10236.772 , 10054.004 , 9853.823 , 11466.215 ,
11383.43 , 10494.957 , 11036.384 , 10353.695 , 12036.261 ,
10399.237 , 10863.066 , 11173.993 , 12195.125 , 11134.862 ,
11214.169 , 9532.257 , 10218.542 , 10476.782 , 10926.903 ,
10387.601 , 10041.191 , 11508.375 , 10975.726 , 12194.327 ,
11001.196 , 10163.793 , 10422.716 , 10196.39 , 11319.277 ,
11851.106 , 10905.927 , 11001.242 , 12666.985 , 11911.664 ,
11512.675 , 10269.776 , 11191.667 , 10678.403 , 12491.099 ,
11123.073 , 10209.746 , 11321.489 , 10726.512 , 12530.016 ,
11244.452 , 11341.183 , 11714.813 , 11018.446 , 11504.731 ,
10639.897 , 11847.306 , 10799.57 , 11339.964 , 10847.511 ,
10263.716 , 10897.173 , 11990.747 , 11287.805 , 10146.464 ,
11019.785 , 11560.8 , 11165.882 , 12585.551 , 10848.289 ,
12230.457 , 10434.004 , 10969.949 , 10993.44 , 9790.655 ,
10479.762 , 10479.762 , 10479.762 , 10479.762 , 10479.762 ,
10479.762 , 10479.762 , 10479.762 , 10479.762 , 10479.762 ,
10479.762 , 10479.762 , 10479.762 , 10479.762 , 10479.762 ,
10479.762 , 10479.762 , 10479.762 , 10479.762 , 10479.762 ,
10479.762 , 10479.762 , 10479.762 , 10479.762 , 10479.762 ,
10479.762 , 10479.762 , 10479.762 , 10479.762 , 10479.762 ,
10479.762 , 10479.762 , 10479.762 , 10479.762 , 10479.762 ,
10479.762 , 10479.762 , 10479.762 , 10479.762 , 10479.762 ,
10479.762 , 10479.762 , 10871.276 , 10479.762 , 10871.276 ,
10479.762 , 10871.276 , 10479.762 , 10871.276 , 10479.762 ,
10871.276 , 10479.762 , 10871.276 , 10479.762 , 10871.276 ,
10479.762 , 10871.276 , 10479.762 , 10871.276 , 9990.332 ,
10479.762 , 10871.276 , 9990.332 , 10479.762 , 10871.276 ,
9990.332 , 10479.762 , 10871.276 , 9990.332 , 10479.762 ,
10871.276 , 9990.332 , 10479.762 , 10871.276 , 9990.332 ,
10479.762 , 10871.276 , 9990.332 , 10479.762 , 10871.276 ,
9990.332 , 10479.762 , 10871.276 , 9990.332 , 10479.762 ,
10871.276 , 9990.332 , 10479.762 , 10871.276 , 9990.332 ,
10479.762 , 10871.276 , 9990.332 , 10479.762 , 10871.276 ,
9990.332 , 10479.762 , 10871.276 , 9990.332 , 10479.762 ,
10871.276 , 9990.332 , 10479.762 , 10871.276 , 9777.745 ,
9990.332 , 10479.762 , 10871.276 , 9777.745 , 9990.332 ,
10479.762 , 10871.276 , 9777.745 , 9990.332 , 10479.762 ,
10659.266 , 10871.276 , 9777.745 , 10668.087 , 9990.332 ,
10479.762 , 10659.266 , 10871.276 , 9777.745 , 10668.087 ,
9990.332 , 10479.762 , 10659.266 , 10871.276 , 9777.745 ,
10668.087 , 9990.332 , 10479.762 , 10659.266 , 10871.276 ,
9777.745 , 10668.087 , 9990.332 , 10479.762 , 9831.267 ,
10659.266 , 10871.276 , 9777.745 , 10668.087 , 9990.332 ,
10479.762 , 9831.267 , 10659.266 , 10871.276 , 9777.745 ,
10668.087 , 9990.332 , 10479.762 , 9831.267 , 10348.397 ,
10659.266 , 10871.276 , 11238.733 , 9777.745 , 10668.087 ,
9990.332 , 10479.762 , 9831.267 , 10348.397 , 10659.266 ,
10871.276 , 11238.733 , 9777.745 , 10668.087 , 9990.332 ,
10479.762 , 9831.267 , 10348.397 , 10659.266 , 10871.276 ,
11238.733 , 9777.745 , 10668.087 , 9990.332 , 10479.762 ,
9831.267 , 10348.397 , 10659.266 , 10871.276 , 11238.733 ,
9777.745 , 10668.087 , 9990.332 , 10479.762 , 9831.267 ,
10348.397 , 10659.266 , 10871.276 , 11238.733 , 11991.761 ,
9777.745 , 10668.087 , 9990.332 , 10479.762 , 9831.267 ,
10348.397 , 10659.266 , 10871.276 , 11238.733 , 11991.761 ,
9777.745 , 10668.087 , 9990.332 , 10479.762 , 9831.267 ,
10348.397 , 10659.266 , 10871.276 , 11238.733 , 11991.761 ,
9777.745 , 10668.087 , 9990.332 , 10479.762 , 9831.267 ,
10348.397 , 10659.266 , 10871.276 , 11238.733 , 11991.761 ,
9777.745 , 10668.087 , 9990.332 , 10479.762 , 9831.267 ,
10348.397 , 11287.884 , 10659.266 , 10871.276 , 11238.733 ,
11991.761 , 9777.745 , 10668.087 , 9990.332 , 10479.762 ,
9831.267 , 10348.397 ], dtype=float32)